Map, Filter, Fold with optimized recursion

I want to write my functional transformations like

// Some transformation lambdas
Function<Integer, Integer> doubleUp = x -> 2 * x;
Predicate<Integer> isEven = x -> 0 == x % 2;

// Some numbers 
List<Integer> numbers = Arrays.asList(1, 2, 3, 4, 5);

// A way to get Iterators with the even numbers doubled up
Iterable<Integer> doubleEven = map(doubleUp, 
                                   filter(isEven, numbers));

// And then a way to sum those doubled values up 
// Result becomes: 12 (0 + 4 + 8) 
int result = fold(Integer::sum,
                  0, 
                  doubleEven);

rather than this awkward stream weirdness


Function<Integer, Integer> doubleUp = x -> 2 * x;
Predicate<Integer> isEven = x -> 0 == x % 2;

List<Integer> numbers = Arrays.asList(1, 2, 3, 4, 5);
   
// Creating a stream each time, just because those streams aren't reusable
Function<List<Integer>, Stream<Integer>> doubleEven =
    list -> list.stream()
                .filter(isEven)
                .map(doubleUp);

// And then sum those values up creating that new stream each time, with pretty awkward syntax
int result = doubleEven.apply(numbers)
                       .reduce(0, 
                               Integer::sum);

Java Streams aren’t classic collections. They deliberately hide traversal order. For pure functions, that’s usually fine, pure code doesn’t care about the order of elements. This also opens up the possibility for parallel execution. Still, understanding the actual ordering guarantees isn’t straightforward. It depends on the source, not just on whether the stream is sequential or parallel. Code that works today might fail tomorrow if it relied on accidental behavior. If you’re planning to use Streams seriously, it’s a good idea to read the specification carefully.

Sometimes, though, we want control. Sometimes it’s worth giving up that last bit of performance to gain simplicity, to guarantee execution order, to allow safe side effects, and to make transformations more predictable.

Streams are a great addition, but they come with a fixed API. Adding new functionality can quickly become awkward. Luckily, Streams aren’t the only way to model functional transformations.

In functional programming languages, transformations often work on linked lists — typically lazy ones, where nothing is evaluated until it’s needed. You describe the computation up front, and it delivers values as you demand them.

It’s not common in Java to work directly with simple linked lists, like standalone cons cells. Instead, we prefer abstractions like Iterable to represent sequential data.

Iterators can indeed be lazy, producing new values only as next() is called. This fits well with the idea of describing computations up front and delivering results when needed. Supplier is a new default for delayed execution.

Let’s start with the long version of map, We want to be able to write code like:

Iterable<Integer> original = Arrays.asList(1,2,3,4);
Iterable<Integer> doubled = map(x -> x+2, original)
// and be able to print
for(int i : original) 
   System.out.println(i);
for(int i : doubled)
   System.out.println(i);

To have numbers 1-4 and 2-8 printed

Iterators.map using boilerplate (long version)

The most basic higher order functions found in functional languages are: map, filter and fold. Lets look at map, which applies a function to every element in a sequence.

If an Iterable should translate elements from another Iterable of type A, to elements of type V, we need an Iterable of V taking a transformer function from A to V, and the source Iterator of A

class LazyMapIterable<V, A> implements Iterable<V> {
  private final Function<? super A, ? extends V> transformer; 
  private final Iterable<? extends A> source;

  // Ctor with the transformer function, and the source Iterable. The 
  // transformer function will be appield on values of iterators of this
  public LazyMapIterable(Function<? super A, ? extends V> transformer, 
                         Iterable<? extends A> source) { 
    this.transformer = transformer;
    this.source = source;
  }

  // Return tranforming iterator, up on request 
  @Override public Iterator<V> iterator() {
    return new LazyMapIterator<>(source.iterator(), transformer);
  }
}

Even though the model we are starting to create here is based on the Iterable, it is their Iterators that does the heavy lifting. There is not much heavy lifting for something simple as the map function. The iterator gets an iterator from the source, and the transformer function as is, and applies that to each fetched value when requested, as a lazy calculation. Everything is done in next().

class LazyMapIterator<V, A> extends Iterator<V> {
  private final Iterator<? extends A> sourceIterator;
  private final Function<? super A, ? extends V> transformer;

  public LazyMapIterator(Iterator<? extends A> sourceIterator, 
                         Function<? super A, ? extends V> transformer) {
    this.sourceIterator = sourceIterator;
    this.transformer = transformer;
  }

  @Override public boolean hasNext() {
    return sourceIterator.hasNext();
  }

  // Transform each next value, up on request.
  @Override public V next() {
    if (sourceIterator.hasNext())
      return transformer.apply(sourceIterator.next());
    throw new NoSuchElementException();
  }
}

Creating an LazyMapIterable is abstracted away as a function, from Iterable to another Iterable, through the transformer function

<A, V> Iterable<V> map(Function<? super A, ? extends V> transformer,
                       Iterable<? extends A> source) {
  if(transformer == null || source == null)
    return null;
  return new LazyMapIterable<>(transformer, source);
}

The short version

That’s a lot of boilerplate for a single line: return f.apply(i.next());

Let’s simplify. All map-code above can be written as these two smaller functions:

// The map function that creates an Iterable, a functional interface, that creates an iterator 
public static <A, V> 
Iterable<V> map(final Function<? super A, ? extends V> f, 
                final Iterable<A> data) {
    requireNonNulls(data, f);
    return () -> {
            Supplier<Continuation<V>> supplier = () -> mapIterate(f, data.iterator());
            return lazy(supplier).iterator();
        };
}

private static <A, V> 
Continuation<V> mapIterate(final Function<? super A,? extends V> f, 
                            final Iterator<A> elements) {
    return !elements.hasNext()
            ? stop()
            : seq(()->mapIterate(f, elements),
                  f.apply(elements.next()));
}

The map function creates an Iterable, a functional interface, so as a lambda that creates an iterator. The Iterator returns a lazy, trampoline based, iterator, wich transforms each element as it reads elements from any iterator of the supplied source, in mapIterator.

This is much more concise than the boilerplate version we started with, above.

Why not streams

Streams are very effective, but also very complex. Ordering is not obvious and paralellism has to be considered. It is not recomended to introduce side effects in streams as they may not occur as intended. Orderig can suddenly change, and the work steeling algorithm is used in the parallel operation is not prepared for side effects. It will most likely not only do things we did not expect, but also not work well technically. It is designed to work without waiting code, where computation constantly occur. Exception handling is also tricky with streams. Passing streams to foreign functions is very risky, and not carefully crafted behavior may change with new releases. There are many laws to consider.

We already understand Iterables. They are easy to work with. All Java programmers already understand ArrayList and the for each construct. It is not difficult for most to understand that mutation through an iterator makes things difficult, that Iterator.remove() is toublesome. Most are just not used to do functional transformations with iterables like above.

Streams does not build a promise. It is executed once. It is not a reusable model for execution. With Iterables, every iterator becomes it’s own flow, and you can create as many of them as you want.

Iterable<Integer> original = Arrays.asList(1,2,3,4);
Iterable<Integer> doubled = map(x -> x * 2, original)
original.forEach(System.out::println);
doubled.forEach(System.out::println);
//And doubled produced can be used how many times you like. Here it will print 2,4,6 and 8 each time.
doubled.forEach(System.out::println);

A stream throws exception if it is executed again. It does not form a promise, like lazy Iterables do.

List<Integer> original = Arrays.asList(1, 2, 3, 4);
Stream<Integer> doubledStream = original.stream().map(x -> x + 2);
original.forEach(System.out::println);
// But doubledStream can only be printed once
doubledStream.forEach(System.out::println);
// Trying to print it again will throw Exception!!
doubledStream.forEach(System.out::println); // => Exception!!

This opens upp for more flexibity for functional transformation with Iterables. You can always create an iterator and look at intermediate results, and it becomes much easier to compose computation steps.

Iterable<Integer> original = Arrays.asList(1,2,3,4);
Iterable<Integer> doubled = map(x -> x * 2, original);
Iterable<Integer> incremented = map(x -> x + 1, doubled);
// prints 2,4,6 and 8
doubled.forEach(System.out::println); 
// prints 3,5,7 and 9
incremented.forEach(System.out::println);

Trampoline-based lazy sequences

A trampoline avoids stack overflows in recursive calls. We saw it earlier in Tail Call Recursion in Java

Here we did use trampoline to initiate optimized recursion, while the recursive function used either recur to bounce on to the next call, or done to bounce out of the trampoline. We had this recursive factorial as an example:

long factorial(int n){
    return trampoline(factorial(0, n));
}
Supplier<Continuation<Long>> factorial(long acc, int n) {
    return () -> n == 0 ? done(acc) : recur(factorial(acc * n, n - 1));
}

The trampoline has now been extended with lazy, that returns an Iterable instead of a scalar from trampoline. It is a lazy Iterable, with Iterators that produce values laziliy, on demand, as next() is called.

The recursive function, producing elements uses seq to produce values lazily, our old done to produce a last value, or stop to stop bounsing, end the sequence, without a last value.

/** 
 * Will call fun followed by fun of recur(fun) returned from fun 
 * until done(value) is returned from fun 
 */
<V> V trampoline(Supplier<Continuation<V>> fun);
/** return done to when last value is found */
<V> Continuation<V> recur(Supplier<Continuation<V>> f);
/** return done to when last value is found */
<V> Continuation<V> done(V v);

/** 
 * Emits a lazy Iterable recuring until stop or done 
 * In contrast to trampoline that emits a single value
 */
<V> Iterable<V> lazy(Supplier<Continuation<V>> fun);
/**
 * return seq to yield lazy value v, and continue withLast f a
 * f as null is same as done or stop, depending on v
 * Will behave like recur on a trampoline
 */
<V> Continuation<V> seq(Supplier<Continuation<V>> f, V v);
/** Return stop when done without a value. Null if on trampoline */
<V> Continuation<V> stop();


class Continuation<V> {
    /** The next function */   
    public final Supplier<Continuation<V>> fun;
    /** has Value or Continuation */
    public final boolean hasValue;
    /** The final value, if present */
    public final V value;

    private Continuation(Supplier<Continuation< V>> fun, 
                         V value, 
                         boolean hasValue) {
        this.fun = fun;
        this.value = value;
        this.hasValue = hasValue;
    }
}

Again, we are producing values recursivly, without building stack, but now driven by an Iterator.

Backwards or mutable

As map is a function that produces a new Iterable it is not supposed to change the original. This makes it pretty important for map of an single linked list or an Iterable to be lazy, as the order of elements would otherwise be reversed. When adding an element last we would otherwise have to create the whole list of elements every time we add something, unless we add it first and the sequence of elements becomes reversed.

Lazy filter

The next fundamental function, filter, does not transform but pass through elements that fit a criteria to a new Iterable. It has a predicate rather than a transformer and uses the old recur in the trampoline, rather than seq when a element is ignored. The returned Iterable is lazy, its Iterators filter the elements within next().

public static <T> 
Iterable<T> filter(final Predicate<? super T> predicate, // Predicate rather than transformation
                   Iterable<T> source) {
    requireNonNulls(predicate, source);
    return () -> {
        Iterator<T> sourceIterator = source.iterator();
        return sourceIterator.hasNext()
                ? lazy(filterI(predicate, sourceIterator)).iterator()
                : emptyIterator();
    };
}

private static <T>
Supplier<Continuation<T>> filterI(Predicate<? super T> predicate,
                                  Iterator<T> elements) {
    return () -> {
        if (!elements.hasNext())
            return stop();
        T next = elements.next();
        return predicate.test(next)
                ? Continuation.seq(filterI(predicate, elements), next)
                // Recur rather than seq when the element is ommitted
                : Continuation.recur(filterI(predicate, elements));
    };
}

Example: Compose filter, map and fold

// The new way to define functions as data
Function<Integer, Integer> doubleUp = x -> 2 * x;
Predicate<Integer> isEven = x -> 0 == x % 2;

// Some data 
List<Integer> numbers = Arrays.asList(1, 2, 3, 4, 5);

// Definition of a lazy transformation, as an Iterable
Iterable<Integer> doubleEven= map(doubleUp, filter(isEven, numbers));

// While a terminal fold functions takes an iterator, and  
// sums the values, starting with a base accumulator of 0. 
// Result becomes: 12 (0 + 4 + 8) 
int result = fold((sum, x) -> sum + x, 
                  0, 
                  doubleEven);

The simplest of the 3 classic transformation function is fold or reduce, that accumulates the values into a scalar value, as seen above. It is terminal so not difficult at all. We are used to writing terminal, strict, non lazy, functions with Iterables.

The version where an accumulator is explicitly supplied is usually called fold.

public static <A, V> 
A fold(BiFunction<? super A, ? super V, ? extends A> f, 
       A accumulator, 
       Iterable<? extends V> elements) {
    requireNonNulls(f, elements);
    // Accumulate by applying each with the accumulator to the BiFunction 
    for (V v : elements)
        accumulator = f.apply(accumulator, v);
    return accumulator;
}

Summary

We have shown that simple Iterables can be used for functional transformation. Streams are not necesary for this at all, and sometimes you need better control.

Using a trampoline allows us to both avoid avoid stack overflows with recursion and implement lazy iteration without heavy boilerplate

By splitting each transformation into a bootstrap method and a recursive core, we can elegantly process data in a functional, lazy, and expressive way—even in Java.

Iterables.java

The impementation is provided as a single source file rather than a library, for you to integrate smoothly and extend to your needs.

But this rabbit hole goes deeper, and this post continues.


All #art #clojure #csharp #data-structures #database #datomic #emacs #fsharp #functional #haskell #history #immutability #java #jit #jmm #lambdas #lisp #pioneers #poetry #programming #programming-philosophy #randomness #rant #reducers #repl #smalltalk #sql #threads #women