Iterators: Effortless sequencing

Iterators: Effortless sequencing

Performant lazily.

ยท

9 min read

Imagine a library with lots of books, in a "C# term," a collection of books. You find yourself on a search for a specific book, navigating through numerous bookshelves. Finally, after some searching, you found it.

Now, consider an alternative scenario, instead of looking for a book yourself, you could've asked the librarian. The librarian would hand you the exact book you're looking for. Iterators function much like this librarian. Instead of traversing through the entire collection of book, they promptly provide you with the exact book you seek. The librarian and iterator share a common trait, they give you something only when you ask for it. In above example, it was a book. You asked the librarian for that specific book, you got the book.

An iterator is a method that provides sequence or source for an enumeration using the yield break; or yield return; statement. And yes, it is necessary to use yield keyword to declare an iterator block. yield return statement produces a value and pauses the execution whereas yield break statement terminates the execution of the iterator method. Iterators can return one of the following types:

  • IEnumerable

  • IEnumerable<T>

  • IEnumerator

  • IEnumerator<T>

where T can be any type parameter or regular type

If the return type of iterator method is non-generic, the yield type is object whereas for generic, the yield type is the provided type parameter. Eg:

  • Non-Generic Iterator method (return type is object) :
public IEnumerable NonGenericIteratorBlock() {
    yield return "hello from non-generic iterator block.";
}
  • Generic iterator method (return type is of type parameter, i.e. int) :
public IEnumerable<int> GenericIteratorBlock() {
    yield return 42;
    yield return 100;
}

Iterators stops the execution in the following cases:

  • Exception occurs within the iterator block

  • Reaches yield return statement which says it is ready to produce the value

  • Hits the yield break statement which signals end of the sequence, hence MoveNext() returns false

  • End of the method or the iterator block (MoveNext() returns false)

Here is another simple iterator method that prints even numbers. It uses the yield keyword to generate even numbers on demand :

public IEnumerable<int> GenerateEvenNumbers()
{
    for (int i = 0; ; i++)
    {
        yield return i * 2;
    }
}

The absence of condition in the for loop makes it an infinite. To get first 10 even numbers :

Example example = new Example(); // Initialize class which contain iterator method
IEnumerable<int> evenNumbers = example.GenerateEvenNumbers();// iterator method
// invoke iterator by taking first 10 
Console.WriteLine("Iterator block:");
foreach (int evenNumber in evenNumbers.Take(10))
{
    Console.WriteLine(evenNumber);
}

The loop stops after processing 10 elements, demonstrating the lazy evaluation of the iterator. Output of this snippet:

It is totally possible to create a method with similar result without using yield or iterator block.

public List<int> GenerateEvenNumbersNonIterator(int totalCount)
{
    List<int> evenNumbers = new List<int>();
    int i = 0;
    while (evenNumbers.Count < totalCount)
    {
        evenNumbers.Add(i);
        i += 2;
    }
    return evenNumbers;
}
Console.WriteLine("Non-iterator block:");
foreach (int evenNumber in example.GenerateEvenNumbersNonIterator(10))
{
    Console.WriteLine(evenNumber);
}

Output:

Though the end results may seem similar, the main difference is lazy evaluation. Before digging into lazy evaluation, let's look at IEnumerable and IEnumerator.

IEnumerable

IEnumerable interface from System.Collections namespace enables a collection to be enumerated or sequenced over. In simpler terms, implementing IEnumerable allows you to iterate through the collection using a loop. IEnumerable is the base class for all non-generic collections that can be enumerated. It has a single method GetEnumerator() which returns an IEnumerator.

IEnumerator

IEnumerator interface from System.Collections namespace is a standard mechanism used for iterating over a collection. It has methods and properties like Current (to get the current item or element), MoveNext() (advances to the next element), Reset() (resets the iterator block or sets enumerator to its initial value). Enumerator is used to read underlying data in the collection and is not designed for mutable(changing or modifying) operations.

IEnumerable/IEnumerator relationship

Think of an IEnumerable as a playlist of music (collection of music) and IEnumerator as a music player. IEnumerable represents a collection of playlist that you can iterate over. IEnumerator as a music player, it keeps track of the current song in the playlist (collection). The music player is responsible for playing songs one at a time. IEnumerator keeps track of the current song in the playlist (collection) as you iterate over it. You can play or skip each song using a loop and the MoveNext() method respectively. It doesn't change the playlist at all. It only change the music player's current state.

You can have multiple playlists (collections of music). You can also have multiple music players (instances of IEnumerator), each playing its playlist independently. Moving the music player to the next song doesn't affect other music players or the playlists itself.

When you want to start playing through songs, you ask the playlist (IEnumerable) for a new music player (IEnumerator) with IEnumerable.GetEnumerator(). This new music player is set to play through the songs in the playlist. Now, you're in control, able to pause, skip, and enjoy the music. What's interesting is that these actions, like moving to the next song using MoveNext(), only impact the current state of your music player, not the original playlist.

Lazy evaluation

In context of iterator block, lazy evaluation is a mechanism where a sequence is delayed or paused until its value is actually needed. It avoids unnecessary computation until the result is required. With the yield keyword, the iterator provides values on demand and one at a time. So the entire collection is not generated and stored in memory upfront. Iterators can have the following pros :

  • Efficient memory usage: As I mentioned, the entire collection is not stored in memory upfront. Hence, storing only when it is required.

  • Infinite sequence: Since the values are generated as needed, working with "infinite" sequence is feasibly practical.

  • Performant: Iterator avoids unnecessary computation, resulting in improved performance. The difference can be measured when working with resource intensive computation or calculations.

Here is a snippet to compute infinite even numbers:

// example is a instance of a class that has the iterator GenerateEvenNumbers()
IEnumerable<int> enumerable = example.GenerateEvenNumbers();
IEnumerator<int> enumerator = enumerable.GetEnumerator();
while (enumerator.MoveNext())
{
    int val = enumerator.Current;
    Console.WriteLine(val);
}
 // Ctrl + C to stop the execution.

Here, we first created an enumerable of type int (IEnumerable<int>), serving as an iterator for generating even numbers. Then, we instantiated an enumerator for that enumerable using GetEnumerator(), which returns an enumerator of type int. This sets up the collection to be iterated over one element at a time, initializing the enumerator to its initial element. Following that, a while loop checks enumerator.MoveNext(). The MoveNext() method moves the enumerator to the next element in the sequence. If an element exists, it returns true; otherwise, it returns false. The loop continues as long as MoveNext() returns true. enumerator.Current is a property that retrieves the current element in the sequence indicated by the enumerator.

Let's recall again to execute this "infinite" iterator 10 times :

IEnumerable<int> enumerable = example.GenerateEvenNumbers();
foreach(int val in enumerable.Take(10))
{
    Console.WriteLine(val);
}

enumerator.MoveNext() returns true for the first 10 iterations of the loop. The Take(10) method ensures that the loop iterates only 10 times by limiting the number of elements taken from the infinite sequence.

Working mechanism of lazy evaluation

In the above example, we have a iterator GenerateEvenNumbers(); which is a IEnumerable<int>. Nothing happens right away when we call it. Execution happens when we call MoveNext() of GenerateEvenNumbers()'s IEnumerator. The cool part? It doesn't bother generating all the numbers at once. It only works when prompted. Call MoveNext(), get a number. Don't call it, and the iterator stays in sleep mode, not bothering with the rest of the sequence. When MoveNext() is called again, it continues from where it left. Enumerator keeps track of the execution state allowing it to seamlessly pick up from where it left off. It's like bookmark in the book.

Lazy evaluation is like asking for even numbers one at a time using GenerateEvenNumbers(). It doesn't bother showing all the numbers at once; only when you say "Show me more" with MoveNext(). It remembers where it left off, acting like a bookmark in a book, making it easy to continue when you want.

Being lazy enhances efficiency and resources utilization. Unlike other means, where all computations are performed upfront, lazy evaluation delays computation until the result is actually needed. In GenerateEvenNumbers(), it generates even numbers on-demand using yield return. Without lazy evaluation, attempting to generate an infinite sequence would result in a program trying to calculate an infinite number of values, which is not practical or feasible. Nature of lazy enables you to work with potentially infinite data structures in a way that is both efficient and effective.

Using LINQ in Iterators

LINQ operates seamlessly with iterators allowing you to manipulate and be expressive with large set data. Here is a simple example to print even numbers greater than 10 and less than 100.

var conditionalEvenNumbers = example.GenerateEvenNumbers().Where(x => x > 10 & x < 100);
foreach(int val in conditionalEvenNumbers)
{
    Console.WriteLine(val);
}

This snippet prints out even numbers greater than 10 and less than 100 but there's a catch. GenerateEvenNumbers() will never terminate because the sequence is infinite. MoveNext() is not called after 98 (which is the last element from the where clause). The IEnumerator keeps track of its internal state, including the position in the sequence, allowing you to seamlessly continue from where you left off. So, when you conditioned the sequence, the state of the collection is still preserved by IEnumerator. This state-preserving behavior is a fundamental characteristic of iterators and lazy evaluation.

var conditionalEvenNumbers = example.GenerateEvenNumbers().Where(x => x > 10 & x < 100);
foreach(int val in conditionalEvenNumbers.Take(5))
{
    Console.WriteLine(val);
}

Here, we take first 5 elements. This will only take 5 elements that are greater than 10 and less than 100. This ensures that the iterator terminates and doesn't run indefinitely.

When working with iterators, its crucial to handle errors to ensure robustness. Use yield break; to exit iterators. If you want to end the sequence when some condition is met, use yield break;

Summary

  1. Iterator:

    • Method that provides sequence or source for an enumeration.

    • Implemented using yield return and yield break statements.

    • Requires the yield keyword in iterator blocks for lazy execution.

  2. Return Types of Iterators:

    • Can return types such as IEnumerable, IEnumerable<T>, IEnumerator, or IEnumerator<T>.

    • The return type of non-generic iterators is object, while generic iterators return the specified type parameter.

  3. Halting Execution:

    • Iterators pause execution in cases of exceptions, encountering yield return, reaching yield break, or reaching the end of the iterator block.
  4. Lazy Evaluation:

    • Lazy evaluation is a key feature of iterators.

    • Mechanism where a sequence is delayed or paused until its value is actually needed.

    • Demonstrated by generating even numbers on demand with the yield keyword.

    • Non-iterator methods compute results upfront, in contrast to the lazy evaluation of iterators.

  5. IEnumerable and IEnumerator Relationship:

    • IEnumerable represents a collection, and IEnumerator moves through elements one at a time.
  6. Efficient Resource Utilization:

    • Lazy evaluation enhances efficiency by avoiding unnecessary computation until the result is needed.

    • Enables working with potentially infinite data structures.

  7. LINQ with Iterators:

    • LINQ operates seamlessly with iterators, allowing expressive manipulation of data.
  8. Handling Errors in Iterators:

    • Using yield break to exit iterators when necessary.

    • Using try-catch properly ensures smooth runtime execution.

Sourcecode.

ย