现在的位置: 首页 > 综合 > 正文

IDictionary Options – Performance Test – SortedList vs. SortedDictionary vs. Dictionary vs. Hashtabl

2018年05月04日 ⁄ 综合 ⁄ 共 25510字 ⁄ 字号 评论关闭

http://blog.bodurov.com/Performance-SortedList-SortedDictionary-Dictionary-Hashtable

Please note that the advantage of Hashtable over generic Dictionary for insert and search operations demonstrated here is actually because the tests are based on NON generic IDictionary interface, so each insert or search action is accompanied with check for
the key type. For more information see the Sean's comment below. (Thanks Sean!) Without that Dictionary seem to perform better than the Hashtable. The rest of the test conclusions will not be affected. So to summarize, generic Dictionary is the absolute winner
for insert and search operations if used with generic dictionary interface or directly.

This is a sequence of tests comparing the performance results for four different implementations of IDictionary and in particular generic Dictionary, generic SortedDictionary, the old non-generic Hashtable and generic SortedList.

I performed several tests, comparing the following parameters: memory used in bytes, time for the insertion in ticks, time for the item search in ticks, and the time for looping with foreach in ticks. The test was performed 8000 times for all the four implementation,
and the order was random so each implementation has been tested at least 1000 times.

I performed the tests in five stages, to observe the relationship between the number of entries and the performance. In the first stage the collections had 50 items, in the second 500, in the third 5,000 in the fourth 50,000 items.

In this particular test, lower numbers of memory usage or time taken for the execution means better performance. So if we want to present visually a performance chart for each of the parameters we have to deduce a performance coefficient from the raw data.
I have used the following code to calculate the performance coefficient:

Performance Coefficient for value x = min(value 1, value 2, … value n) / value x

This way, because the best performing value is the lowest one it will be transformed as value 1 of the performance coefficient and any other value will be a fraction of that value.

This is the chart of the memory usage:

Memory Allocation

The results stay consistent with the increase of the number of items in collections. Best memory footprint we see in the SortedList, followed by Hashtable, SortedDictionary and the Dictionary has highest
memory usage. Despite all that, we have to note, that the differences are not significant and unless your solution requires extreme sensitivity about the memory usage you should consider the other two parameters: time taken for the insert operations and time
taken for searching a key as more important. It is important to note that this test does not take into consideration the effects of garbage collection and thus can only be taken in a very general way.

This is the chart for the time taken for insert operations:

Insertion

When the number of records is small the differences between all four implementations are not significant but with the increase of items in the collection the performance of the SortedList drops dramatically, SortedDictionary is
better but still taking significantly more time for inserts than the other two implementations. Hashtable is the next in the list and the ultimate leader is the generic Dictionary.

This is the chart for the time taken for search operations:

Searching

The absolute leader is Hashtable, but the test does not consider the type of item being stored. That could be a possibility for a future test. The next best performer is the generic Dictionary followed by the other two implementations.
The differences here between SortedList and SortedDictionary are not significant.

This is the chart for the time taken for for-each collection loop operations:

For Each

Here the leader is SortedList then Dictionary consistantly better than Hashtable and the words performer isSortedDictionary

Here you can see the task manager during the test it shows us a picture of the memory usage. Area A is during the insertion phase when more and more memory has been allocating. Area B is from the end of the insertion until the garbage collection of the object.

Memory - Task Manager

This is the code I used for this test. The variable NumberInsertedKeys was changed to 50, 500, 5000, 50000, or 10000 for the different stages.

using System;
using System.Collections;
using System.Collections.Generic;
using System.Diagnostics;
/*
to compile enter in the command prompt:
C:\WINDOWS\Microsoft.NET\Framework\v2.0.50727\csc IDictTest.cs
to run enter in the command prompt:
IDictTest
*/
namespace IDictTest{

    public class RunResult{
        public decimal MemoryUsed;
        public decimal InsertTicks;
        public decimal SearchTicks;
        public decimal ForEachTicks;
    }

    public class Program
    {

        private static int SearchIndex = 27;
        //private const int NumberInsertedKeys = 50;
        //private const int NumberInsertedKeys = 500;
        //private const int NumberInsertedKeys = 5000;
        private const int NumberInsertedKeys = 50000;
        //private const int NumberInsertedKeys = 100000;
        private const int NumberTests = 8000;

        private static readonly string[] Letters = 
                {"A","B","C","D","E","F","G","H","I","J"};

        public static void Main(string[] args)  {
            try{
            // TRY STARTS HERE ----------

                List<RunResult> listDictionary = new List<RunResult>();
                List<RunResult> listSortedDictionary = new List<RunResult>();
                List<RunResult> listHashtable = new List<RunResult>();
                List<RunResult> listSorderList = new List<RunResult>();
                Stopwatch watch = Stopwatch.StartNew();

                for(int i = 0; i < NumberTests; i++){
                    SearchIndex += 1;
                    Random rand = new Random();
                    int randInt = rand.Next(0, 4);
                    if(randInt == 0){
                      listDictionary.Add(
                          Test("Dictionary", new Dictionary<string, string>(), i));
                    }else if(randInt == 1){
                      listSortedDictionary.Add(
                          Test("SortedDictionary", 
                              new SortedDictionary<string, string>(), i));
                    }else if(randInt == 2){
                      listHashtable.Add(
                          Test("Hashtable", new Hashtable(), i));
                    }else if(randInt == 3){
                      listSorderList.Add(
                          Test("SortedList", new SortedList(), i));
                    }
                }

                Console.Clear();
                Msg("Time taken (minutes): {0} or about {1} minutes and {2} seconds", 
                    watch.Elapsed.TotalMinutes,
                    watch.Elapsed.Minutes,
                    watch.Elapsed.Seconds);
                
                RunResult resultDict = CalculateAvg(listDictionary);
                RunResult resultSortDict = CalculateAvg(listSortedDictionary);
                RunResult resultHash = CalculateAvg(listHashtable);
                RunResult resultSortList = CalculateAvg(listSorderList);
                
                RunResult min = 
                    new RunResult{
                        MemoryUsed = Math.Min(Math.Min(Math.Min(resultDict.MemoryUsed, resultSortDict.MemoryUsed),resultHash.MemoryUsed),resultSortList.MemoryUsed), 
                        InsertTicks = Math.Min(Math.Min(Math.Min(resultDict.InsertTicks, resultSortDict.InsertTicks), resultHash.InsertTicks), resultSortList.InsertTicks), 
                        SearchTicks = Math.Min(Math.Min(Math.Min(resultDict.SearchTicks, resultSortDict.SearchTicks), resultHash.SearchTicks), resultSortList.SearchTicks),
                        ForEachTicks = Math.Min(Math.Min(Math.Min(resultDict.ForEachTicks, resultSortDict.ForEachTicks), resultHash.ForEachTicks), resultSortList.ForEachTicks)
                    }; 
                
                // print the results
                PrintResults(resultDict, listDictionary.Count, min, "Dictionary");
                PrintResults(resultSortDict, listDictionary.Count, min, "SortedDictionary");
                PrintResults(resultHash, listDictionary.Count, min, "Hashtable");
                PrintResults(resultSortList, listDictionary.Count, min, "SortedList");

            // TRY ENDS HERE ----------

            }catch(Exception ex){
                Msg("{0}", ex);
            }
        }
        
        private static RunResult CalculateAvg(List<RunResult> list){
            decimal sumMemory = 0;
            decimal sumInsert = 0;
            decimal sumSearch = 0;
            decimal sumForEach = 0;
            for(int i = 0; i < list.Count; i++){
                RunResult curr = list[i];
                sumMemory += curr.MemoryUsed;
                sumInsert += curr.InsertTicks;
                sumSearch += curr.SearchTicks;
                sumForEach += curr.ForEachTicks;
                // uncomment to print each line
                //Msg("{0,11} {1,13} {2,14}", 
                    //curr.MemoryUsed, curr.InsertTicks, curr.SearchTicks);
            }
            return new RunResult{
                      MemoryUsed = sumMemory / list.Count, 
                      InsertTicks = sumInsert / list.Count, 
                      SearchTicks = sumSearch / list.Count,
                      ForEachTicks = sumForEach / list.Count,
                    };
        }

        private static void PrintResults(RunResult result, int count, RunResult min, string name){
            Msg("--------- Results for {0}", name);
            Msg("# Tests {0}", count);
            Msg("Memory Used    Insert Ticks    Search Ticks    ForEach Ticks");
            Msg("Average Values:");
            Msg("{0,11:N} {1,13:N} {2,14:N} {3,14:N}", 
                result.MemoryUsed, 
                result.InsertTicks, 
                result.SearchTicks, 
                result.ForEachTicks);
            Msg("Performance Coefficient:");
            Msg("{0,11:N} {1,13:N} {2,14:N} {3,14:N}", 
                min.MemoryUsed/result.MemoryUsed, 
                min.InsertTicks/result.InsertTicks, 
                min.SearchTicks/result.SearchTicks, 
                min.ForEachTicks/result.ForEachTicks);
            Msg("");
        }

        private static void Msg(string name, params object[] args){
            Console.WriteLine(name, args);
        }

        private static RunResult Test(string name, IDictionary dict, int n){
            Console.Clear();
            Msg("Currently executing test {1} of {2} for {0} object", 
                name, n + 1, NumberTests);
            RunResult rr = new RunResult();
            Stopwatch watch;
            Random rand = new Random( );
            long memoryStart = System.GC.GetTotalMemory(true);
            long insertTicksSum = 0;
            for(int i = 0; i < NumberInsertedKeys; i++){
                string key = GetRandomLetter(rand, i)+"_key"+i;
                string value = "value"+i;
                
                watch = Stopwatch.StartNew();
                dict.Add(key, value);
                watch.Stop();
                
                insertTicksSum += watch.ElapsedTicks;                
            }
            rr.MemoryUsed = System.GC.GetTotalMemory(true) - memoryStart;
            

            rr.InsertTicks = insertTicksSum;

            watch = Stopwatch.StartNew();
            object searchResult = dict["C_key"+SearchIndex];
            watch.Stop();
            
            rr.SearchTicks = watch.ElapsedTicks;
            
            watch = Stopwatch.StartNew();
            foreach(var curr in dict){}
            watch.Stop();

            rr.ForEachTicks = watch.ElapsedTicks;

            return rr;
        }

        private static string GetRandomLetter(Random rand, int i){
            if(i == SearchIndex){
                return "C";
            }
            return Letters[rand.Next(0, 10)];
        }

    }

}

The computer used for the test has the following characteristics:

Processor: Intel(R) Core(TM)2 Duo CPU T7300 @ 2.00GHz 2.GHz
Memory (RAM): 2.00 GB
System type: 32-bit Operating System
Windows Vista

These is the raw data. Memory Used is in bytes and the insert, search and looping time in ticks:

Time taken (minutes): 0.324318411666667 or about 0 minutes and 19 seconds
--------- Results for Dictionary
# Tests 2063
Memory Used    Insert Ticks    Search Ticks    ForEach Ticks
Average Values:
   4,748.48      3,019.60         126.40         198.21
Performance Coefficient:
       0.73          0.92           0.83           0.80

--------- Results for SortedDictionary
# Tests 2063
Memory Used    Insert Ticks    Search Ticks    ForEach Ticks
Average Values:
   4,345.04      5,000.54         193.05         377.99
Performance Coefficient:
       0.80          0.55           0.55           0.42

--------- Results for Hashtable
# Tests 2063
Memory Used    Insert Ticks    Search Ticks    ForEach Ticks
Average Values:
   4,096.38      2,775.27         105.36         209.91
Performance Coefficient:
       0.85          1.00           1.00           0.75

--------- Results for SortedList
# Tests 2063
Memory Used    Insert Ticks    Search Ticks    ForEach Ticks
Average Values:
   3,488.50      4,350.48         167.89         158.41
Performance Coefficient:
       1.00          0.64           0.63           1.00

8000 tests 500 collection entries Time taken (minutes): 1.26410570333333 or about 1 minutes and 15 seconds --------- Results for Dictionary # Tests 2158 Memory Used Insert Ticks Search Ticks ForEach Ticks Average Values: 53,353.52 28,345.86 123.94 1,051.16 Performance Coefficient: 0.73 0.96 0.92 0.71 --------- Results for SortedDictionary # Tests 2158 Memory Used Insert Ticks Search Ticks ForEach Ticks Average Values: 48,949.40 67,007.68 232.22 2,562.93 Performance Coefficient: 0.80 0.41 0.49 0.29 --------- Results for Hashtable # Tests 2158 Memory Used Insert Ticks Search Ticks ForEach Ticks Average Values: 48,053.46 27,341.10 114.07 1,303.74 Performance Coefficient: 0.81 1.00 1.00 0.57 --------- Results for SortedList # Tests 2158 Memory Used Insert Ticks Search Ticks ForEach Ticks Average Values: 39,077.27 54,255.06 209.31 744.81 Performance Coefficient: 1.00 0.50 0.54 1.00
8000 tests 5000 collection entries Time taken (minutes): 11.14550681 or about 11 minutes and 8 seconds --------- Results for Dictionary # Tests 1578 Memory Used Insert Ticks Search Ticks ForEach Ticks Average Values: 527,352.00 274,339.83 171.81 9,724.85 Performance Coefficient: 0.80 0.98 0.85 0.72 --------- Results for SortedDictionary # Tests 1578 Memory Used Insert Ticks Search Ticks ForEach Ticks Average Values: 498,944.56 853,251.69 288.00 22,777.03 Performance Coefficient: 0.85 0.32 0.50 0.31 --------- Results for Hashtable # Tests 1578 Memory Used Insert Ticks Search Ticks ForEach Ticks Average Values: 480,048.17 268,945.01 145.19 11,911.69 Performance Coefficient: 0.88 1.00 1.00 0.59 --------- Results for SortedList # Tests 1578 Memory Used Insert Ticks Search Ticks ForEach Ticks Average Values: 424,512.12 913,408.34 247.98 7,011.65 Performance Coefficient: 1.00 0.29 0.59 1.00
8000 tests 50000 collection entries Time taken (minutes): 299.723793791667 or about 4 hours, 59 minutes and 43 seconds --------- Results for Dictionary # Tests 1574 Memory Used Insert Ticks Search Ticks ForEach Ticks Average Values: 5,427,604.19 2,818,130.50 302.05 100,556.45 Performance Coefficient: 0.82 1.00 0.87 0.73 --------- Results for SortedDictionary # Tests 1574 Memory Used Insert Ticks Search Ticks ForEach Ticks Average Values: 5,318,944.00 10,506,892.94 525.14 294,219.13 Performance Coefficient: 0.84 0.27 0.50 0.25 --------- Results for Hashtable # Tests 1574 Memory Used Insert Ticks Search Ticks ForEach Ticks Average Values: 5,005,100.00 2,867,786.62 263.39 111,733.71 Performance Coefficient: 0.89 0.98 1.00 0.66 --------- Results for SortedList # Tests 1574 Memory Used Insert Ticks Search Ticks ForEach Ticks Average Values: 4,443,275.91 82,146,134.48 493.06 73,798.76 Performance Coefficient: 1.00 0.03 0.53 1.00

Share this post:    digg  
   Stumble
Upon 
     del.icio.us  
   Technorati  
  E-mail

feedback temporarily disabled

Moritz Kroll
Posted on 1/3/2008 11:45:36 AM

Thanks for these interesting measurements! 
I was exactly looking for this kind of information. 
People considering which collection to use, might also be interested in the MSDN page "SortedList and SortedDictionary Collection Types" (http://msdn2.microsoft.com/en-us/library/5z658b67.aspx),
which tells us the upper bounds of some operations on these collections in big O notation.

Vinit
Posted on 4/25/2008 1:17:46 AM

Thanks for the above test! 
I was also looking for this option. 
I guess i will go with HashTable, as it looks the fastest of the lot

Flavius
Posted on 6/11/2008 5:18:08 AM

Thanks! I just started to think about what to use and then found your artikle. It's good!

Flavius
Posted on 6/11/2008 5:55:46 AM

You forgot to add measure units (bytes) for memory allocation chart.

Vladimir Bodurov
Posted on 6/11/2008 8:15:29 AM

Thanks Flavius, I've added that now.

Gary Chojnowski
Posted on 8/12/2008 8:37:12 PM

I found your results useful. However, it may be important to point out that the hashtable does not support generics.

Tanveer Ansari
Posted on 8/25/2008 6:23:02 AM

Thanks for an article with immediate practical use. It helped in finalizing on which data structure to use for my app. 
As Gary has pointed out above, the generic Dictionary has the advantage of type safety so it would seem that we should also factor in the unboxing overhead for the Hashtable's value object.

In my test for retrieval time using the hashtable with unboxing vs. generic dictionary the hashtable still performed three times as fast as the generic dictionary.

--Time in ticks-- 
Time taken to fetch all elements of Dictionary<> was 10559 
Time taken to fetch all elements of hashtable without unboxing was 1431 
Time taken to fetch all elements of hashtable WITH unboxing was 2846

JR
Posted on 2/15/2009 1:27:20 PM

This article was awesome! Thanks for sharing your findings with us. It was very helpful to the code I am writing.

One thing that would have been nice to see would have been insertion time for the sorted collections in the case that the strings were already sorted, as that happens sometimes.

Thanks!

Jeffery
Posted on 4/17/2009 12:28:42 AM

Hi,

I just did the same test but using ASP.NET 3.5 and IIS 7.0, the results seems to indicate otherwise: Dictionary is better than SortedDictionary!

Take a look at http://jefferytay.wordpress.com/2009/04/16/performance-of-generics-sorteddictionary-and-dictionary/
for specifics

Vladimir Bodurov
Posted on 4/17/2009 12:47:25 AM

Hi Jeffery,

First of all thank you for the excellent work you have done! I like the attitude to never trust any source in the Internet and always check your self. Undoubtedly that’s the way to go.

However there is one problem that I see with your test. You always process SortedDictionary before Dictionary. If you read the description of my tests I have tried those in different order and what is more important each was performed separately. When you
start performing operations on the SortedDictionary you initiate actions of the virtual machine related to large memory allocation and then when you reach the Dictionary the hard work is already done. I would encourage trying the same experiment in different
order. But it is true that my test was with .NET 2.0 so it may have changed in 3.5.

ET
Posted on 4/23/2009 1:00:30 PM

I think you're missing an important variable/dimention in your tests - number of items in the collection.

Vladimir Bodurov
Posted on 4/23/2009 1:14:18 PM

Yes you are right in the comming days I will make a revision of the test as it seems that as pointed by Jeffery tests done with bigger sample show Dictionary performing better than SortedDictionary

Jeffery
Posted on 4/23/2009 8:16:50 PM

Hi Vladimir,

Actually based on your results, i actually got my team to convert all Dictionary objects to SortedDictionary when they are doing their development. Because i remembered that dictionary does not maintain any sort order, and SortedDictionary, which uses a
red-black tree, improves searching speed.

However during when i was doing some coding, i realized that for some wierd reason, my new codes seem to be performing much slower, that's when i decided to take our code for a test run :)

Vladimir Bodurov
Posted on 4/23/2009 9:43:43 PM

Sorry man, my data is just not big enough and when I was doing it something influenced SortedDictionary to perform better than Dictionary and because of the small sample that wasn’t normalized. I'll have soon results with very large sample, but my preliminary
data shows that you seem to be right and Dictionary performs better than SortedDictionary.

Jeffery
Posted on 4/29/2009 2:32:08 PM

Hi Valdimir,

Thanks for your extremely detailed test which tallies with mine. Now it just makes me wonder. Since Dictionary is better than SortedDictionary, why is it still there? Any ideas?

Vladimir Bodurov
Posted on 4/29/2009 3:20:00 PM

Because if you try foreach iteration with SortedDictionary the keys will be sorted while Dictionary will not give you sorted keys.

Jeffery
Posted on 4/30/2009 7:01:55 PM

Well if i really did a foreach, that is a possibility, but it does not make sense for SortedDictionary to perform a sort on the key lists each and everytime we requests for it, i do not believe Microsoft developers will write such lousy codes.

Interesting thing is that i cannot seem to be able to find the source for Dictionary using reflector, any idea where it is residing in?

Will try to perform a test next week to test the differences in doing foreach and direct indexing for Dictionary and SortedDictionary, there must be some benefits of SortedDictionary

Vladimir Bodurov
Posted on 4/30/2009 10:37:13 PM

SortedDictionary does not peform sort each time you request a key it simply stores the data as a binary tree, that's why it can iterate them in order.

If you had to show the keys in order how would you do it with Dictionary? If the keys are not just 1,2,3,4 but something like 2,24,255,999 and you don't know what are the gaps? With SortedDictionary you just use foreach. The code of Dictionary is in mscorlib
System.Collections.Generic

Nugpot
Posted on 5/5/2009 4:10:36 PM

Your pictures looks nice, and it looks like you tried your best, but your results are definitely not accurate. I did similar tests today, and went on the web to see what other people found. Just to help everybody I am going to add my results. 
This is for adding 2000000 items to each list, timing how long it will take to load the data, search 3 values in the data (one at the end, one at the beginning and one that don't exist). Then seeing how long it take to unload the data. Here is the results.
Not your pretty pictures, but at least values that make sense.

5/5/2009 2:22:39 PM - Array 
5/5/2009 2:22:40 PM - End Adding data: 00:00:00.7499952 
5/5/2009 2:22:40 PM - Search complete: 00:00:00.0312498 
5/5/2009 2:22:40 PM - Data Cleared: 00:00:00.0312498 
5/5/2009 2:22:40 PM - Total time: 00:00:00.8124948

5/5/2009 2:22:40 PM - ArrayList 
5/5/2009 2:22:41 PM - End Adding data: 00:00:00.7343703 
5/5/2009 2:22:41 PM - Search complete: 00:00:00.0312498 
5/5/2009 2:22:41 PM - Data Cleared: 00:00:00.0312498 
5/5/2009 2:22:41 PM - Total time: 00:00:00.7968699

5/5/2009 2:22:41 PM - List<> 
5/5/2009 2:22:42 PM - End Adding data: 00:00:00.7343703 
5/5/2009 2:22:42 PM - Search complete: 00:00:00.0624996 
5/5/2009 2:22:42 PM - Data Cleared: 00:00:00.0312498 
5/5/2009 2:22:42 PM - Total time: 00:00:00.8281197

5/5/2009 2:22:42 PM - HashTable 
5/5/2009 2:22:44 PM - End Adding data: 00:00:02.2812354 
5/5/2009 2:22:44 PM - Search complete: 00:00:00 
5/5/2009 2:22:44 PM - Data Cleared: 00:00:00.0468747 
5/5/2009 2:22:44 PM - Total time: 00:00:02.3281101

5/5/2009 2:22:44 PM - Dictionary<> 
5/5/2009 2:22:46 PM - End Adding data: 00:00:01.4218659 
5/5/2009 2:22:46 PM - Search complete: 00:00:00 
5/5/2009 2:22:46 PM - Data Cleared: 00:00:00.0312498 
5/5/2009 2:22:46 PM - Total time: 00:00:01.4687406

5/5/2009 2:22:46 PM - SortedList<> 
5/5/2009 3:57:03 PM - End Adding data: 01:34:17.1991614 
5/5/2009 3:57:03 PM - Search complete: 00:00:00.1562490 
5/5/2009 3:57:05 PM - Data Cleared: 00:00:01.7343639 
5/5/2009 3:57:05 PM - Total time: 01:34:19.1678988

5/5/2009 3:57:05 PM - SortedDictionary<> 
5/5/2009 3:57:16 PM - End Adding data: 00:00:11.4061770 
5/5/2009 3:57:16 PM - Search complete: 00:00:00 
5/5/2009 3:57:16 PM - Data Cleared: 00:00:00.0156249 
5/5/2009 3:57:16 PM - Total time: 00:00:11.4218019

Vladimir Bodurov
Posted on 5/5/2009 4:34:43 PM

Thanks for posting your results Nugpot, but I really don't understand what do you mean when you say "but your results are definitely not accurate", because your results are just like my results, in particular: 
Insert Performance: 
1. Dictionary - best 
2. HashTable - next 
3. SortedDictionary - next 
4. SortedList - worst 
Just like mine test. And you search performance is: 
1. HashTable, Dictionary, SortedDictionary - you don't have precise measurement 
2. SortedList 
Again fits to my test. If you have different data the numbers will be different but the order will be the same.

It is also not clear what you mean by "unload data" as you didn't post reference to where your code is. Your insert and search results are just like mine I can't speak for the rest because I haven't seen it but if you compare things they must be similar
not apples with screw drivers.

Shital Shah
Posted on 10/13/2009 1:15:41 PM

What was the hardware used for these tests?

Sean
Posted on 1/5/2010 1:52:21 PM

FYI your average calculation is wrong. Averaging 2,3,4 would come out as ((((2 + 3) / 2) + 4) / 2) or 3.25, instead of 3.

Also you should time how long it takes to execute all the lookups at once and take the average off of that instead of timing each individual one.

I ran some tests designed purely to test performance of lookups, no calling of GC(as .NET should pick when for you in the real world), although i did semi-track inserts.

Test was inserting X keys, then doing x*2 lookups (x where the key existed x where it didn't)

I let each of those tests of X keys run 10-15 times, and I came up with

Key/Value          Keys           Lookup                              Insert 
string/string      20,000         dictionary is roughly 10% faster    Dictionary is roughly 30% faster 
int/string         20,000         dictionary is roughly 50% faster    Roughly equal 
string/int         20,000         dictionary is roughly 10% faster    Dictionary is roughly 30% faster 
int/string         200,000        Dictionary is roughly 50% faster    Dictionary is roughly 30% faster 
int/string         2,000          Dictionary is roughly 50% faster    Roughly equal 
int/string         200            Dictionary is roughly 50% faster    Roughly equal 
string/string      200            Dictionary is roughly 10% faster    Roughly equal 
String/string      2,000          Dictionary is roughly 10% faster    Roughly equal 
string/string      20,000         Dictionary is roughly 10% faster    Hashtable is roughly 15% faster 
string/string      200,000        Dictionary edged it out by about    Hashtable is roughly 15% faster 
                                  3% each time 
Vladimir Bodurov
Posted on 1/6/2010 4:32:33 AM

Thanks for sharing your results Sean. You are right about the avg formula. The way I was doing it would shift the result toward the latest run but at the end the result will still average out. Still I corrected my code to make it more clean and I also extended
my test including foreach loop test to show why you are getting consistently better results for Dictionary compared to Hashtable. The thing is that you are not isolating the looping performance for the rest and because of that, as you can see in my latest
run, Dictionary is better than Hashtable for looping you get a better result. All other tests are consistent with my previous results and the old one can be seen at :

http://blog.bodurov.com/images/IDictTest01_old.gif 
http://blog.bodurov.com/images/IDictTest02_old.gif 
http://blog.bodurov.com/images/IDictTest03_old.gif

Jeffery
Posted on 1/7/2010 5:56:08 PM

Hi Valdimir,

appreciate if you can change the link to my post tohttp://jefferytay.wordpress.com/2009/04/16/performance-of-generics-sorteddictionary-and-dictionary/

Sean
Posted on 1/15/2010 12:58:05 PM

I made a few slight changes to your code and came up with different results

With string keys dictionary performed searches roughly 10% faster with all combinations of keys I tried. 
With int keys dictionary performed searches roughly 40% faster with all combinations of keys I tried.

1: I created another test method called TestGeneric which accepts IDictionary<String, String>. If you look in reflector, calling Dictionary as an IDictionary reference results in validation code being called at runtime to ensure the objects passed in are
actually TKey(in this case string), as opposed to at compile time when its generic (below is a sample of what it is actually doing). This appears provided the biggest performance increase.

            void IDictionary.Add(object key, object value) 
            { 
                Dictionary<TKey, TValue>.VerifyKey(key); 
                Dictionary<TKey, TValue>.VerifyValueType(value); 
                this.Add((TKey) key, (TValue) value); 
            } 

2: I changed the key to C_key{i} instead of {random_letter}_key{i} (to make #3 possible)

3: I changed your search from a single lookup to NumberInsertedKeys lookups, and just checking for each value that had inserted as the key (so not using a foreach loop to eliminate that possible conflict)

            watch = Stopwatch.StartNew(); 
            for (int i = 0; i < NumberInsertedKeys; i++) 
            { 
                string key = "C_key" + i.ToString(); 
                string searchResult = dict[key]; 
            } 
            watch.Stop(); 

4: I commented out the randomness to determine which

Also when I commented out the garbage collect code to test memory usage Dictionary tests seemed to increase more (I personally prefer to test without garbage collection since in production my code will rely on .NET to select when to run it for me anyways).
But that test does appear to be impacting the results of the other tests. (Disclaimer: I only ran it with string keys and 50000 entries)

I switched it back to using random letters and looking up every entry and got roughly the same results, I had to make more changes to do that though by caching the random entries and reusing them.

            Random rand = new Random(); 
            for (int i = 0; i < NumberInsertedKeys; i++) 
            { 
                Keys[i] = GetRandomLetter(rand, i) + "_key" + i; 
                Values[i] = "value" + i; 
            } 
 
           ........................ 
 
            for (int i = 0; i < NumberSearches; i++) 
            { 
                string key = Keys[i];// "C_key" + i.ToString();// GetRandomLetter(rand, i) + "_key" + i; 
                string value = Values[i];// "value" + i; 
                ......... 
            } 
Eric
Posted on 3/3/2010 8:10:05 AM

Thank you this was really very helpful. Many of the constructors for these collections support an initial capacity. While I understand that it is not always possible to predict this capacity, I do quite often make a range estimate. If I recall, the default
initial capacity is quite low. This would result in frequent extensions of the List style collections. The dictionary style collections would expand more slowly. It might be interesting to examine some effects of initial capacity. Thanks again for your previous
work.

Raj Aththanayake
Posted on 4/10/2010 5:13:28 PM

Excellent blog. I was looking for these type of information. I was comparing the SortedDictionary and SortedList. Now I know SortedList is much better on performances.

Joshua DeLong
Posted on 5/26/2010 8:07:56 AM

Has anyone tried this type of thing with millions? I'm looking at putting roughly 30 million integer pairs into a hashtable for lookup purposes only. Any ideas which storage method would be best?

phoog
Posted on 11/25/2010 3:54:20 AM

Another factor: string.GetHashCode() iterates through the entire string. So presumably, the longer the keys, the worse performance you'll find from Dictionary. Another point about SortedList: If the items are added in order, its insertion performance is
better than SortedDictionary, because SortedDictionary rebalances the tree, and SortedList just adds the item on the end without reshuffling anything.

Scanpst
Posted on 4/13/2011 6:02:20 AM

Sean, 
Thanks for the change in the code. I tried it and it worked great for me. It was very helpful.

Joshua, 
I have a project where I need to do this with millions also. Did you ever figure out what storage method worked best? Any tips would be great!

【上篇】
【下篇】

抱歉!评论已关闭.