In the sorting notes (see Sorting Algorithms) we took advantage of a few ideas to show how to do basic benchmarking to compare the various approaches.
In this lab, you get your chance to learn a bit more about performance by comparing searches. The art of benchmarking is something that is easy to learn but takes a lifetime to master (to borrow a phrase from the famous Othello board game).
Most of the algorithms we cover in introductory courses tend to be polynomial in nature. That is, the execution time can be expressed as a polynomial function of the size of the data size \(n\). Examples include but are not limited to:
And there are way more than these shown here. As you progress in computing, you’ll come to know and appreciate these in greater detail.
In this lab, we’re going to look at a few different data structures and methods that perform searches on them and do empirical analysis to get an idea of how well each combination works. Contrasted with other labs where you had to write a lot of code, we’re going to give you some code to do all of the needed work but ask you to do the actual analysis and produce a basic table.
We’re going to measure the performance of data structures we have been learning about in lectures. For this lab, we’ll focus on:
In the interest of fairness, we are only going to look at the time it takes to perform the various search operations. We’re not going to count the time to randomly-generate the data and actually build the data structure. The reasoning is straightforward. We’re interested in the search time, which is completely independent of other aspects that may be at play. We’re not at all saying that the other aspects are unimportant but want to keep the assignment focus on search.
The experimental apparatus that we are constructing will do the following for each of the cases:
You will probably find it convenient to start from our Arrays MonoDevelop solution. You can find this in projects/Arrays. You need this entire folder.
To make your life easier, we have put together a project within this solution, PerformanceLab that contains the code for all of the experiments you need to run. (That’s right, we’re giving you the code for the experiments, but you’re going to write some code to run the various experiments and then run for varying sizes of n.)
Here is the code for the first experiment, to test the performance of linear searching on integer arrays:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | public static long ExperimentIntArrayLinearSearch (int n, int rep, int seed)
{
Stopwatch watch = new Stopwatch ();
int[] data = new int[n];
Sorting.IntArrayGenerate (data, seed);
watch.Reset ();
watch.Start ();
int m = Math.Max(1, n/rep);
// perform the rep lookups
for (int k=0, i=0; k < rep; k++, i=(i+m)%n) {
Searching.IntArrayLinearSearch (data, data [i]);
}
watch.Stop ();
return watch.ElapsedMilliseconds;
}
|
Let’s take a quick look at how this experiment is constructed. We’ll also take a look at the other experiments but these will likely be presented in a bit less detail, except to highlight the differences:
Each of the other experiments is constructed similarly. For linear search and binary search we use the methods created earlier. For the lists and the set we use the built-in Contains method to search. The list and set are directly initialized in their constructors from the array data.
You need to fill in the Main method:
Write the code to parse command line args for the parameters rep and any number of values for n. For instance:
mono PerformanceLab.exe 50 1000 10000 100000
would generate the table shown below for 50 repetitions for each of the values of n, 1000, 10000, and 100000.
Write the code to run each of the experiments for rep and a given value of n.
Iterate through the values of n and print a table, something like the following, with the number of seconds calculated. Experiment and adjust the repetitions to get perceptible values. Our choice of 50 may not be appropriate with these n values.
n rep linear binary list set
1000 50 ??.??? ??.??? ??.??? ??.???
10000 50 ??.??? ??.??? ??.??? ??.???
100000 50 ??.??? ??.??? ??.??? ??.???
The table would be longer if more values of n were entered on the command line.