Saturday, April 10, 2010

STL Annoyance

If you are using the std::vector or std::deque containers and want to obtain an iterator to a location in your container, don't even think about doing this:

std::vector<int> foo;
// Set the iterator once we've created the container
std::vector<int>::iterator iter = foo.begin();
// Add some stuff to it
foo.push_back(2);
foo.push_back(3);
// Use the iterator... oops!
while (iter != foo.end()) {
std::cerr << *iter << "\n";
++iter;
}
view raw badbegin1.cpp hosted with ❤ by GitHub


Or this:

std::vector<int> foo;
// Add an item before setting the iterator...
// Should work, right?
foo.push_back(2);
std::vector<int>::iterator iter = foo.begin();
// Add another item
foo.push_back(3);
// Use the iterator... oops!
while (iter != foo.end()) {
std::cerr << *iter << "\n";
++iter;
}
view raw badbegin2.cpp hosted with ❤ by GitHub


So, you can't obtain a valid iterator if you try to obtain it before your container has any items. You also can't obtain a valid iterator if you obtain it while your container has an item, but then you add more items to it and try to use the original iterator afterward.

What gives?

Calls to insert() (or push_back()) when size() == capacity() will force the container to reallocate its memory (they double in size). A new block of memory is allocated and then the old block is copied to the new block, and the old block deleted. This is bad news if you've obtained an iterator to an item in the old block of memory because it still points there, but the data is no longer there.

It should be obvious why calls to reserve() have the same effect.

So, what can you do?
  • Don't use an iterator obtained prior to a call to insert() or reserve()  on your std::vector or std::deque
  • Keep the iterator updated
Keeping the iterator updated:

#include <vector>
#include <iostream>
using namespace std;
int main() {
std::vector<int> foo;
// Add an item and then obtain an iterator to it
foo.push_back(3);
vector<int>::iterator current = foo.begin();
// Check the addresses of our iterator and the first item
std::cerr
<< "Address of the current iterator: " << &current << std::endl;
std::cerr
<< "Address of our first object: " << &foo.at(0) << std::endl;
// Add a new item to the end and update our iterator
current = foo.insert(foo.end(),4);
std::cerr
<< "Address of the current iterator (same as before): "
<< &current << std::endl;
// Check the address of the first item
std::cerr
<< "Address of our first object different from before): "
<< &foo.at(0) << std::endl;
// See if our iterator is valid
std::cerr << "current (4): " << *current << std::endl;
return 0;
}
// output
//
// Address of the current iterator:
// 0xbf809f40
// Address of our first object:
// 0x8ba1008
// Address of the current iterator after adding a new item:
// 0xbf809f40
// Address of our first object (different from before):
// 0x8ba1018
// current (4): 4


Note how the address of our iterator never changes, but the address of the item it points to does change. Therein lies the heart of the matter: When these containers reallocate memory, the data gets moved, but nobody tells the iterators to those pieces of data.

It should also be noted that calls to erase() invalidate any iterators that point to data positioned after the element(s) being erased. The reason should be obvious, as the data is no longer in the same location.

Moral of the story: Don't use an iterator that was obtained prior to calls on your std::vector or std::deque that will cause them to allocate or deallocate memory. Doing so will invalidate your iterators, and using them will yield undefined behavior, which is a nice way of saying "Garbage data at best, An explosion at worst."

For a full explanation of when your iterators will be invalidated (there's more situations than I've listed), consult chapter 23 of the Official C++ Standard (search for all occurrences of the word "valid"):

No comments: