Performance of raw pointers vs smart pointers in C++11

In this post I analyse and discuss the performance of raw pointers vs smart pointers in few C++11 benchmarks. The operations tested are creation, copy, data access and destruction.


pointer sign

Disclaimer

These are limited tests based on simple code I wrote without considering any kind of optimization or coding / design style. They test exclusively STL smart pointers and I ran them only on my Linux machine using gcc 4.8 to compile them.

Things could be totally different in your program running your code on a different machine.

The Benchmarks

The following benchmarks measure the performance of raw pointers, std::unique_ptr and std::shared_ptr in C++11.

Accessing data

The first benchmark simply calls a function of the pointed object using the following code:

for(int r = 0; r < NUM_REPS; ++r)
    p->Do();

The code in Do() is pretty simple and it involves only 2 increments, 1 addition and 1 if.

Construction, data access and destruction

For each pointer the code creates a new (dummy) object, performs some operation with it and then destroys it.

The code used to test C++ raw pointers is the following:

for(int r = 0; r < NUM_REPS; ++r)
{
    ActorIncrement * p = new ActorIncrement;

    p->Do();
    val += p->GetVal();

    delete p;
}

The code used to test C++ std::unique_ptr is the following:

for(int r = 0; r < NUM_REPS; ++r)
{
    std::unique_ptr p(new ActorIncrement);

    p->Do();
    val += p->GetVal();
}

The code used to test C++ std::shared_ptr is the following:

for(int r = 0; r < NUM_REPS; ++r)
{
    std::shared_ptr p(new ActorIncrement);

    p->Do();
    val += p->GetVal();
}

A second test aims at checking the performance of std::make_shared compared to the previous way of creating a shared pointer. The code used for this test is the following:

for(int r = 0; r < NUM_REPS; ++r)
{
    std::shared_ptr p = std::make_shared<ActorIncrement>();

    p->Do();
    val += p->GetVal();
}

The code in Do() is the same as in the previous benchmark, whereas GetVal() is a simple inline function returning a value.

Copy

A common use of pointers is passing data around, this is tested by the last benchmark.

The code used to test C++ raw pointers is the following:

ActorIncrement * p = new ActorIncrement;

for(int r = 0; r < NUM_REPS; ++r)
{
    ActorIncrement * p2 = p;

    p2->Do();
    val += p2->GetVal();

    TestRaw(p);
    TestRaw(p);
}

The TestRaw function replicates the first 3 lines of the for cycle and it’s defined like this:

int TestRaw(ActorIncrement * p);

The code used to test C++ std::shared_ptr is the following:

std::shared_ptr p(new ActorIncrement);

for(int r = 0; r < NUM_REPS; ++r)
{
    std::shared_ptr p2 = p;

    p2->Do();
    val += p2->GetVal();

    val += TestShared(p);
    val += TestShared2(p);
}

The TestShared and TestShared2 functions replicates the first 3 lines of the for cycle and they are defined like this:

int TestShared(const std::shared_ptr & p);
int TestShared2(std::shared_ptr p);

Basically the only difference is how the shared pointer is passed to the function (const reference vs value).

Results

Here the results of running the benchmark code a 1,000,000 times (NUM_REPS = 1000000).

The tests were executed on a 64-bit Kubuntu Linux 14.04 machine powered by an Intel i7-4770 CPU @ 3.40GHz and 16Gb of DDR3 RAM. They were compiled using g++ 4.8.4 with the following flags: -O3 -s -Wall -std=c++11.

All the times are in milliseconds, lower values (green) are better.

Accessing data

raw pointer std::unique_ptr std::shared_ptr
5 5 5

As expected there’s no notable difference during normal usage of any pointer.

Construction, data access and destruction

raw pointer std::unique_ptr std::shared_ptr std::make_shared
23 23 46 27

Things get more interesting when considering the whole life of pointers.

As expected an std::shared_ptr is more expensive to use than a raw pointer and that’s because it performs extra operations and allocates extra memory to handle the automatic memory management. It’s important to notice that despite a 100% increase in time, we are still talking about 23ms per 1M pointers. That means that unless your code looks like the one in the benchmark (and it shouldn’t), it’s never going to be a real issue.

It’s also interesting to notice how std::make_shared almost makes the time gap disappear. That’s because it performs a single heap allocation instead of 2 when creating the std::shared_ptr. The time gain comes at a price as an std::shared_ptr created using std::make_shared is kept alive (memory is not cleared) by any instance of an std::weak_ptr which points to the same object, whereas that doesn’t happen when std::share_ptr is created using the new Object syntax.

Copy

raw pointer std::shared_ptr
18 19

Which means there’s no noticeable difference between raw pointers and shared pointers.

Conclusion

It’s fair to say that smart pointers do not create any real performance issue and that they can be safely used over classic raw pointers in pretty much any normal situation.

Subscribe

Don’t forget to subscribe to the blog newsletter to get notified of future posts.

You can also get updates following me on Google+, LinkedIn and Twitter.

14 Comments

  1. Yamakaky

    Did you activate optimizations? If not, this comparison isn’t really useful.

    Reply
    1. ubik

      Last line: “They were compiled using g++ 4.8.4 with the following flags: -O3 -s -Wall -std=c++11.”

      So optimizations were on.

      Reply
    2. Davide Coppola (Post author)

      yes of course, I added compilation flags to the post.

      Reply
      1. Yamakaky

        Sorry…

        Reply
  2. Bill Torpey

    Your comment about weak_ptr keeping an object alive is incorrect.

    If a weak_ptr is created from a shared_ptr, then it will participate in reference counting. If it is created from a raw pointer, it will not.

    FWIW, this is also true for shared_ptr — if two shared_ptr’s are created from the same *raw* pointer, they don’t know about each other, will have separate reference counts, and one of them is guaranteed to dangle.

    If you want to have multiple smart pointer (weak or shared makes no difference), the second and subsequent smart pointer MUST be created from a smart pointer, not from a raw pointer.

    A good explanation can found at: http://thispointer.com/create-shared_ptr-objects-carefully/

    Reply
    1. Chris Cleeland

      The OP’s comment about weak pointers is definitely incorrect, but in a way different from you suggest. The whole point of a weak pointer is that it DOES NOT participate in the reference counting that keeps the shared pointer alive; weak pointers’ API enforces this by ensuring that the only thing one can do with a weak pointer is copy it or convert it to a shared_ptr of the same type and which itself a copy of the shared_ptr from which the weak_ptr was obtained.

      I’m confused by your assertion that “if a weak_ptr is created from a shared_ptr, then it will participate in the reference counting.” There is no other way to obtain a weak_ptr than through a shared_ptr or copy/move of another weak_ptr. Most importantly, there is no way to create a weak_ptr from a raw pointer (http://en.cppreference.com/w/cpp/memory/weak_ptr/weak_ptr).

      And weak_ptr’s definitely do not participate in the counts of strong references. Most shared_ptr implementations do track the number of extant weak_ptrs, that a non-zero weak_ptr count does not prevent destruction of the owning shared_ptr.

      Reply
      1. Davide Coppola (Post author)

        I am just reporting what stated in the stackoverflow discussion I posted below.

        It’s something which I haven’t tested myself yet, but I will and eventually will blog about it.

        Reply
      2. Gabriel Sanchez

        What he means is when you create a std::shared_ptr using new vs std::make_shared. In the former case, the control block and the object are allocated separately, so when the last std::shared_ptr is destroyed, the object can be deallocated (the control block stays alive if there are any remaining std::weak_ptr). In the latter case, both the object and control block are allocated in a single operation. This means that the object’s memory will remain allocated until the last std::weak_ptr is destoyed.

        Reply
      3. Bill Torpey

        What I wanted to get across was that a weak_ptr will keep the *control block* alive as long as any weak_ptr to the object exists.

        Unfortunately I couldn’t find the diagram I was looking for, and my choice of words was unclear. (Still can’t find the diagram I was looking for, but this is not bad: https://goo.gl/images/cWbKz2)

        That, plus the fact that it is all too easy to create two separate, unrelated shared_ptr’s from a single raw pointer.

        Reply
        1. Chris Cleeland

          Thanks for the clarification; makes much more sense.

          Reply
    2. Davide Coppola (Post author)

      what you are referring to are common pitfalls when using smart pointers.

      What I am talking about is something completely different, you can read more about it here: http://stackoverflow.com/questions/20895648/difference-in-make-shared-and-normal-shared-ptr-in-c

      Reply
  3. Mattias Johansson

    Good post! People are really too afraid of using the smart pointers for performance reasons whereas they are in fact really cheap to use.

    Reply
  4. Sascha

    Not sure I believe the results here without further analysis. Measuring things in a very small loop that is executed millions of times tends to create results not representative of real world performance, due to techniques like loop unrolling and the multi scalar nature of CPUs being able to parallelize certain things in a loop they may not be able to otherwise.

    I suspect that the cost of shared_ptr is quite a bit higher than presented here, since it involves an atomic increment/decrement every time the shared_ptr is passed around. People tend to pass shared_ptr’s around by value for some reason, mistaking them for raw ptrs. This is quite an expensive operation, and probably outweighs the cost of creation and destruction by quite a bit. Unique_ptr’s don’t really have this issue, since they force developers to do the right thing (move instead of copy).

    Unless for some very specific use cases, I don’t really see the need for anybody to use shared_ptr’s in single threaded code. A combination of unique_ptr’s and raw ptrs/references should be preferred. There is a reason why there is no shared_ptr implementation without an atomic integer inside, that pretty much gives a hint that shared_ptr’s are primarily for multi threaded code, and their overheard should be avoided in single threaded applications.

    Reply
  5. Uwe

    I am always surprised, how many experts really believe, that they can write the most efficient code in the world by manually optimizing their C/C++ code through the usage of obscure techniques for the final gain of 12 microsecs in the main loop, only in order to forget to sort their main data structure only once instead of over and over again inside each iteration. That is, where people should focus, IMHO.

    The most important statement of Davide’s article, I think, definitely is: “we are still talking about 23ms per 1M pointers. That means that unless your code looks like the one in the benchmark (and it shouldn’t), it’s never going to be a real issue”.

    Thank you Davide, for a great and useful analysis!

    Reply

Leave a Comment

Your email address will not be published. Required fields are marked *