What are the main differences between a Rust Iterator and C++ Iterator?


Keywords:c++ 


Question: 

A typical example of a C++ iterator is a pointer, and can be used to point at an element in a C array like so:

int array[] = {1, 2, 3, 4};
int* begin = std::begin(array); //Starting iterator
int* end = std::end(array) //Ending iterator

for(int* i = begin; i < end; i++)
{
    std::cout << *i << ',';
}

//Prints 1, 2, 3, 4

This is simple enough. The definition of an iterator from cplusplus.com is

An iterator is any object that, pointing to some element in a range of elements (such as an array or a container), has the ability to iterate through the elements of that range using a set of operators...

This makes sense; in the above code there were two iterators (the begin and the end iterators), and it used a for loop and incremented.

In Rust, an iterator is used like this:

let vect = vec![1, 2, 3, 4];

let vect_iter = vect.iter();

What? To iterate it you do:

vect_iter.next();
vect_iter.next();

I couldn't find any exact definition of a pointer in the Rust docs, but looking at the Iterator trait, it is seems that an iterator is a wrapper for a container that enables for easier processing, by standardizing the logic in a way (if that makes sense at all).

The main questions I have are:

  1. What are the main differences?
  2. Why does Rust have iterators in this fashion and why are they expressed so differently?
  3. Are there Rust-type iterators in C++?
  4. Are there C++-type iterators in Rust?
  5. Are they called something specific? (Internal/External?)

3 Answers: 

An iterator is a concept found in programming languages to refer to a construct which allows iterating over collections or sequences of elements. The concept is purposely vague, it's a concept! It does not prescribe any specific implementation.

To more easily distinguish C++ from Rust, I am going to use different names:

  • C++ iterators will be named cursors,
  • Rust iterators will be named streams.

Yes, those are completely arbitrary. Note that if you look at languages like Java or C#, you will find that they also use streams.


C++

First of all, don't use cplusplus.com. cppreference.com is much better.

An iterator is any object that, pointing to some element in a range of elements (such as an array or a container), has the ability to iterate through the elements of that range using a set of operators...

Simple, and wrong.

A cursor may either:

  • point to an element,
  • or be singular and point to no element at all.

In general, the singular value is used to represent:

  • the end of the sequence to iterate over: vec.end(),
  • the absence of an element: std::find(...).

You can increment, and sometimes decrement, a cursor. If you do so, you however generally need a pair of cursors to know when to stop.

Why did C++ use such a representation? Because that's how C did it, and it works pretty well... although it's error prone.


Rust

Rust endeavors to be safe and favors APIs which are easy to use. This rules out a pair of cursors:

  • a pair of cursors is not safe: you can easily iterate out of bounds, and you can obtain aliasing references,
  • a pair of cursors is error-prone: it's easy to accidentally pair off cursors from two different sequences.

In order to control bounds, aliasing and avoid pair mismatch, you have to use a single object; thus the stream-like API.

The Iterator API in Rust is reminiscent of that of Java and C#, although Rust improves upon it by using Option<T> so that instead of the clumsy hasNext()/next() call pair, it offers a single method next() which both advances the stream and may signal its end.


Conclusion

Both Rust and C++ features a way to iterate over a collection of elements:

  • C++ offers a C-like way, flexible but error-prone,
  • Rust offers a modern way, safe but less flexible.

Both languages also offer external and internal iteration:

  • External: the user controls the iteration (calls ++ or next()),
  • Internal: the iterator controls the user code (see std::foreach and Iterator::foreach).


Iterators in Rust and C++ are conceptually quite different.

C++

In C++, an iterator is like a pointer. Iterators refer to an object, they can be incremented to refer to the next object, and they can be compared for equality with other iterators. Iterators can also refer to no object at all—they can refer to the “one past the end” element of a sequence, or they can be “singular” (which is like a null pointer). Some iterators support additional operations like moving both forwards and backwards, random access, and being copied.

A pointer in C++ is a valid iterator, but there are also other types which are iterators.

Iterators do not represent a sequence of elements, at least, that is not the convention. In C++, if you want a sequence of elements, you need a pair of iterators*: one for the beginning and one for the end. You aren't forced to iterate over the elements in sequence, you can do all sorts of other things. For example, if you want to reverse an array in C++, you can do it with iterators:

#include <algorithm>
#include <iterator>
#include <cstdio>
#include <utility>

template <typename T, std::size_t N>
void reverse_array(T (&arr)[N]) {
    using std::swap;
    auto left = std::begin(arr), right = std::end(arr);
    while (left < right) {
        --right;
        swap(*left, *right);
        ++left;
    }
}

int main() {
    int x[] = {1, 2, 3, 4, 5};
    reverse_array(x);
    for (const auto it : x) {
        std::printf("%d\n", it);
    }
    return 0;
}

But you can quickly generalize it to work on any container with bidirectional iterators:

#include <algorithm>
#include <iterator>
#include <list>
#include <cstdio>
#include <utility>

template <typename Iterator>
void reverse_any(Iterator left, Iterator right) {
    using std::swap;
    while (left != right) {
        --right;
        if (left == right)
            break;
        swap(*left, *right);
        ++left;
    }
}

int main() {
    std::list<int> list{1, 2, 3, 4, 5};
    reverse_any(std::begin(list), std::end(list));
    for (const auto it : list) {
        std::printf("%d\n", it);
    }
    return 0;
}

Rust

In Rust, an iterator is like a slice. Iterators refer to a sequence of objects, and elements can be accessed from the iterator using the next() method. In a sense, this means an iterator in Rust has both the begin and end iterator inside it. Reimplementing the C++ code above in Rust, you'll get something like this:

fn reverse_any<'a, T: 'a, Iter>(mut iter: Iter)
where
    Iter: DoubleEndedIterator<Item = &'a mut T>,
{
    while let Some(left) = iter.next() {
        if let Some(right) = iter.next_back() {
            std::mem::swap(left, right);
        }
    }
}

fn main() {
    let mut v = [1, 2, 3, 4, 5];
    reverse_any(v.iter_mut());
    println!("{:?}", v);
}

This has the added benefit of safety. Iterator invalidation is one of the most common sources of errors in C++ programs, but Rust eliminates the problem completely.

The cost is that if you want to mutate the elements, you are limited to a single (possibly double-ended) iterator in Rust, while in C++, you can have as many iterators as you want working with the same container. While single-ended and double-ended ranges are the most common case for iterators, there are some algorithms that use the additional flexibility that C++ provides.

One simple example I can think of is C++'s std::remove_if. A straightforward implementation of remove_if would use three iterators: two iterators for keeping track of the range of elements being scanned, and a third iterator for tracking the elements being written. You could translate std::remove_if to Rust, but it would not be able to operate on normal Rust iterators and still modify the container in-place.

Another simple example is the Dutch National Flag problem, which commonly uses three iterators. The solution to this problem is often used for partitioning elements for quicksort, so it's an important problem.

Summary

A Rust iterator is almost equivalent to a start + end C++ iterator pair. C++ allows you to use multiple iterators and move iterators forwards and backwards. Rust guarantees that you don't accidentally use an invalid iterator, but you can only use one at a time and it can only move in one direction.

I don't know of any terminology for distinguishing these types of iterators. Note that Rust-style iterators are much more common, iterators in C#, Python, Java, etc. work the same way but they might have slightly different names (they are called "enumerators" in C#).

Footnotes

*: Technically this is not true. You only need to have one iterator in C++, however, it is conventional to have a pair and library functions typically operate on pairs of iterators (so you "need" two iterators if you want to use those functions). The fact that you have a (start,end) pair does not mean that sequences are bounded, the end iterator can be infinitely far away. Think of it like having a range (0,∞) in mathematics... ∞ isn't really a number, it's just a placeholder that lets you know that the range is unbounded on the right.

: Remember that just because the “end” iterator exists in C++ does not mean that the sequence actually has an end. Some end iterators in C++ are like infinity. They don’t point to valid elements, and no matter how many times you iterate forward, you won’t reach infinity. In Rust, the equivalent construction is an iterator that never returns None.



I see three things going on here. Let's break it down.

The Idea of an Iterator

When you call C++'s std::begin and Rust's .iter() in your examples, you are receiving two "types of objects" that are conceptually identical: An iterator.

If we forget about the implementation details for a moment, we can see the purpose and usability of an iterator happen to be similar in both languages. We find that both iterators:

  • Are "objects" which can be created from a collection (an "Iterable type")
  • Can be advanced using C++'s std::advance and Rust's .next()
  • Have an "end," determined by C++'s std::end and the output of Rust's .next().

This is a gross oversimplification of course, they are similar and different in many other ways, but this is probably the general overview you're looking for.

The Implementation of an Iterator

Despite sharing common themes, C++ and Rust are very different languages and will naturally implement a single idea differently. Iterators are no exception.

The "why" is too broad to really answer here on Stack Overflow. It's like asking why oranges are orange and bananas are not :)

But you do seem somewhat confused about how to work with Rust's implementation of iterators, given your comment:

I couldn't find any exact definition of a pointer in the Rust docs

Don't think like a C++ programmer right now. Check out The Book if you haven't already and explore the concepts of borrowing and ownership; It is the far more typical way you will work with data, and it is required to understand how Rust's iterators work.

The Syntactical Sugar for Iterators

Both C++ and Rust have "magic" in their for loops that let you work with iterator "types" easily.

Contrary to your question, this is not a concept unique to Rust. In C++ an object can be used with the modern for (item : collection) syntax if it implements special methods, similar to the Iterator trait you pointed out.

Summary

What are the main differences?

Not much conceptually.

Why does Rust have iterators in this fashion and why are they expressed so differently?

It be like it is because it do. They are more similar than you believe.

Are there Rust-type iterators in C++? Are there C++-type iterators in Rust?

They are conceptually identical.

Are they called something specific? (Internal/External?)

There might be some fancy academic terminology for the implementation differences but I'm not aware of it. An iterator is an iterator.