Note - 'remove' does not physically delete any objects from any container, it merely shifts other elements around, leaving "invalid" data at the end of the range.PrefaceThe term
container is a fairly general-purpose term, used to describe a feature or mechanism in the C++ programming language which holds a collection of objects or data items of a certain type. Examples of containers include lists, deques, vectors, and even strings.
[Note - a string can be thought-of as a collection of characters]Range is a word used to describe a series or sequence of objects/data items stored in a particular container. Often, the
range that a programmer is interested in, will be every single element from the beginning to the end of a container.
- This article refers to the
remove function, as found in the C++
<algorithm> standard header, within the
std namespace; and assumes the reader already has a basic familiarity with at least one of the STL containers and its associated iterator(s). (Although not all STL containers are suitable for use with
remove - notably,
stack and
queue)
An
iterator is an object or variable which acts as a "
place-holder", that is tasked with remembering the position of an element within a container. iterators are often used as temporary variables which help manipulate a container. They are also commonly used by containers to provide reference points to the beginning and end of their stored objects/data items.
A data set using STL vectors The example in this article will remove all occurrances of
5, from a vector container which stores
int objects.
The example container (Which will be called
myvec) looks like this
vector<int> myvec; and will be given a few test values
CODE
myvec.push_back(10);
myvec.push_back(5);
myvec.push_back(-8);
myvec.push_back(5);
myvec.push_back(1);
myvec.push_back(4);
At the moment, its range is
[ 10, 5, -8, 5, 1, 4 ]. Here,
5 appears in multiple places scattered through the container.
In order to delete all elements containing
5 out of
myvec, the most efficient way is to rearrange the data so that all the unwanted elements are overwritten, by shifting other elements back from the other end of the container. This has the side-effect of leaving a bunch of unwanted objects at the far-end, but at least they are all together and can be destroyed efficiently.
The remove algorithm (Abstract)
Given the set of data [ 10, 5, -8, 5, 1, 4 ] - removing all elements which match
5 uses the following process.
- start a counter of matched elements at zero. step to the first element
10 5 -8 5 1 4 - No match found.
- step to the next element
10 5 -8 5 1 4 - a match is found, increase the match counter by 1.
- step to the next element
10 5 -8 5 1 4 - No match found,
- perform a leftwise-shift of 1 position.
10 -8 -8 5 1 4
- step to the next element
10 -8 -8 5 1 4 - a match is found, Increase the match counter by 1.
- step to the next element
10 -8 -8 5 1 4 - No match found
- perform a leftwise-shift of 2 positions
10 -8 1 5 1 4
- step to the next element,
10 -8 1 5 1 4 - No match found
- perform a leftwise-shift of 2 positions
10 -8 1 4 1 4
- hit the end of the list. return an iterator to the first "invalid" element, ready for deletion.
10 -8 1 4 1 4
Each leftwise-shift causes an element to be
copied from its original position - The "invalid" objects at the end still contain their original data - however, that data is now unwanted in the list. The returned iterator should be used for destroying the invalid objects, shortening the data set.
A closer look at removeThe visible prototype for the
remove function is as follows
CODE
FwdIt remove(FwdIt first, FwdIt last, Ty val )
remove is a template function (or, a 'generic' cookie cutter for a function) which doesn't exist until provided with more information. The prototype gives some clues as to the missing information;
FwdIt and
Ty. (Note: these two names for the missing information are in fact template parameters - so they are likely to be named something else depending on the compiler and the implementation of the standard library used)
The
remove function requires that
FwdIt must be some sort of iterator (a "place holder") capable of stepping through a container, and that
Ty must be able to represent the type of data that is stored by that container - in this case, the stored data is known to be
int.
Using remove in the exampleThe STL vector is supplied with two different iterator types,
iterator and
const_iterator . The const_iterator does not allow any modification of the vector, so that is no use here -
iterator must be used instead - hence the full type of the iterator is now known to be
vector<int>::iteratorWith this information, the
remove prototype can be expanded to make more more sense, here is the full prototype which will be used by the real function in the program.
vector<int>::iterator remove( vector<int>::iterator first, vector<int>::iterator last, int val ) Now its clear - the first two parameters are iterators (placeholders) for the start and finish points in the data set, and the third parameter is the value to be removed.
Usefully, the STL vector has two methods which are used to obtain iterators to the begining and the end of the stored data. these methods are
begin() and
end() respectively. Now the call to remove looks like this
remove( myvec.begin(), myvec.end(), 5 ); The last thing to do, is to create an iterator variable, to keep track of the invalid data. This will be called
invalidCODE
vector<int>::iterator invalid;
invalid = remove( myvec.begin(), myvec.end(), 5 );
Destroying invalid objectsDestroying the invalid objects at the end of the data set depends somewhat on what kind of container is being used. for a vector, the
erase method is needed.
erase comes in two flavours - the first simply removes one element, which is of no use, since the data set will have two invalid objects at the end. the other variation of
erase removes a range of elements.
erase will insist on having a
vector<int>::iterator to work with - which is handy, because that's exactly the same type as the
invalid iterator left by
remove. The end of the invalid elements will be the end of the container. Thus, the call to the
erase method will look like this.
myvec.erase( invalid, myvec.end() ); remove in actionThis complete, compilable example shows a data set before, and after the
remove operation. The data set has 2 elements removed, leaving 2 'rogue' data items at the end of the data, which are eventually destroyed by the
myvec.erase method.
CODE
#include <iostream>
#include <vector>
#include <algorithm>
int main()
{
using namespace std;
//Populate myvec with the data set 10, 5, -8, 5, 1, 4
vector<int> myvec;
myvec.push_back(10);
myvec.push_back(5);
myvec.push_back(-8);
myvec.push_back(5);
myvec.push_back(1);
myvec.push_back(4);
cout << "\n\n Initial data set: ";
for(size_t i(0); i!=myvec.size(); ++i)
cout << myvec.at(i) << ' ';
//Remove the data elements matching '5'
vector<int>::iterator invalid;
invalid = remove( myvec.begin(), myvec.end(), 5 );
cout << "\n\n Data set after remove: ";
for(size_t i(0); i!=myvec.size(); ++i)
cout << myvec.at(i) << ' ';
//Destroy the remaining invalid elements
myvec.erase( invalid, myvec.end() );
cout << "\n\n Data set after erase: ";
for(size_t i(0); i!=myvec.size(); ++i)
cout << myvec.at(i) << ' ';
}
ConclusionThe C++
remove algorithm is often the most efficient way to remove elements from an un-sorted array-based container, such as the STL vector. the remove algorithm minimises the amount of copying and resizing typically involved in manipulating such containers, and is used by other named functions in the <algorithm> header, such as
remove_if and
remove_copy_if, which both allow greater flexibility when selecting elements for deletion, by including a parameter for a function or functor which determines whether or not an object should be removed. Functions and functors used in this way generally return a boolean value, and are known as
predicates - Predicates are used instead of a single 'match' value.
When
remove is used in conjunction with arrays, pointers act as iterators, however, depending on the array and its use in context, the method of destroying invalid objects will vary - In some cases, the objects may not be destroyed, but simply left, if an index or pointer variable is used to track the active range of the array.
The remove algorithm should not be used on sorted containers such as
set and
map which are based on a Binary Tree structure, and depend on the sorted order of their elements to work correctly.
ReferencesThe C++ Standard Library, Nicolai M Josuttis, Addison-Wesley, 1999.
The C++ Programming Language, Bjarne Stroustrup, Addison-Wesley, 1997.