Javanotes 9, Section 4.5 -- Lambda Expressions

Section 4.5

Lambda Expressions

In a running program, a subroutine is just a bunch of binary numbers (representing instructions) stored somewhere in the computer's memory. Considered as a long string of zeros and ones, a subroutine doesn't seem all that different from a data value such as, for example, as an integer, a string, or an array, which is also represented as a string of zeros and ones in memory. We are used to thinking of subroutines and data as very different things, but inside the computer, a subroutine is just another kind of data. Some programming languages make it possible to work with a subroutine as a kind of data value. In Java 8, that ability was added to Java in the form of something called lambda expressions.

Lambda expressions are becoming more and more common in Java programs. They are especially useful for working with the stream API that will be covered in Section 10.6. However, we won't encounter them again until near the end of Chapter 5, so you can skip this section for now if you want.

4.5.1 First-class Functions

Lambda is a letter in the Greek alphabet that was used by the mathematician Alonzo Church in his study of computable functions. His lambda notation makes it possible to define a function without giving it a name. For example, you might think that the notation x² is a perfectly good way of representing a function that squares a number, but in fact, it's an expression that represents the result of squaring x, which leaves open the question of what x represents. We can define a function with x as a dummy parameter:

static double square( double x ) {
    return x*x;
}

but to do that, we had to name the function square, and that name becomes a permanent part of the program—which is overkill if we just want to use the function once. Alonzo Church introduced the notation lambda(x).x² to represent "the function of x that is given by x²" (except using the Greek letter instead of the word "lambda"). This notation is a kind of function literal that represents a value of type "function" in the same way that 42 is an integer literal that represents a value of type int.

Having function literals is the starting point for thinking of a function as just another kind of data value. Once we do that, we should be able to do the same things with functions that we can do with other values, such as assign a function to a variable, pass a function as a parameter to a subroutine, return a function as the value of subroutine, or even make an array of functions. A programming language that allows you to do all those things with functions is said to have "first-class functions" or "functions as first-class objects."

In fact, you can do all of those things with Java lambda expressions. Java's notation is different from the one used by Alonzo Church, and in spite of the name "lambda expression" it does not even use the word lambda. In Java, the lambda expression for a squaring function like the one above can be written

x  ->  x*x

The operator -> is what makes this a lambda expression. The dummy parameter for the function is on the left of the operator, and the expression that computes the value of the function is on the right. You might see an expression like this one being passed as an actual parameter to a subroutine, assigned to a variable, or returned by a function.

So are functions now first-class in Java? I'm not quite sure. There are some cool things that can be done in other languages but can't be done in Java. For example, in Java we can assign the above expression to a variable named, say, sqr, but we can't then use sqr as if it actually is a function. For example, we can't say sqr(42). The problem, really, is that Java is a strongly typed language; to have a variable named sqr, we must declare that variable and give it a type. But what sort of type would be appropriate for a value that is a function? The answer in Java is something called a functional interface, which we turn to next.

But first one more note: Lambda expressions in Java can actually represent arbitrary subroutines, not just functions. Nevertheless, it is the term "function" that is usually associated with them, rather than "subroutine" or "method."

4.5.2 Functional Interfaces

To know how a subroutine can be legally used, you need to know its name, how many parameters it requires, their types, and the return type of the subroutine. A functional interface specifies this information about one subroutine. A functional interface is similar to a class, and it can be defined in a .java file, just like a class. However, its content is just a specification for a single subroutine. Here is an example:

public interface FunctionR2R {
    double valueAt( double x );
}

This code would be in a file named FunctionR2R.java. It specifies a function named valueAt with one parameter of type double and a return type of double. (The name of the parameter, x, is not really part of the specification, and it's a little annoying that it has to be there.) Here is another example:

public interface ArrayProcessor {
    void process( String[] array, int count );
}

Java comes with many standard functional interfaces. One of the most important is a very simple one named Runnable, which is already defined in Java as

public interface Runnable {
    public void run();
}

I will use these three functional interfaces for examples in this section.

"Interfaces" in Java can be much more complicated than functional interfaces. You will learn more about them in Section 5.7. But it is only functional interfaces that are relevant to lambda expressions: a functional interface provides a template for a subroutine that might be represented by a lambda expression. The name of a functional interface is a type, just as String and double are types. That is, it can be used to declare variables and parameters and to specify the return type of a function. When a type is a functional interface, a value for that type can be given as a lambda expression.

4.5.3 Lambda Expressions

A lambda expression represents an anonymous subroutine, that is, one without a name. But it does have a formal parameter list and a definition. The full syntax is:

( parameter-list )  ->  { statements }

As with a regular subroutine, the parameter-list can be empty, or it can be a list of parameter declarations, separated by commas, where each declaration consists of a type followed by a parameter name. However, the syntax can often be simplified. First of all, the parameter types can be omitted, as long as they can be deduced from the context. For example, if the lambda expression is known to be of type FunctionR2R, then the parameter type must be double, so it is unnecessary to specify the parameter type in the lambda expression. Next, if there is exactly one parameter and if its type is not specified, then the parentheses around the parameter list can be omitted. On the right-hand side of the ->, if the only thing between the braces, { and }, is a single subroutine call statement, then the braces can be omitted. And if the right-hand side has the form { return expression; }, then you can omit everything except the expression.

For example, suppose that we want a lambda expression to represent a function that computes the square of a double value. The type of such a function can be the FunctionR2R interface given above. If sqr is a variable of type FunctionR2R, then the value of the function can be a lambda expression, which can be written in any of the following forms:

sqr = (double x) -> { return x*x; }; // The full lambda expression syntax!
sqr = (x) -> { return x*x; };
sqr = x -> { return x*x; };
sqr = x -> x*x;
sqr = (double fred) -> fred*fred;
sqr = (z) -> z*z;

The last two statements are there to emphasize that the parameters in a lambda expression are dummy parameters; their names are irrelevant. The six lambda expressions in these statements all define exactly the same function. Note that the parameter type double can be omitted because the compiler knows that sqr is of type FunctionR2R, and a FunctionR2R requires a parameter of type double. A lambda expression can only be used in a context where the compiler can deduce its type, and the parameter type has to be included only in a case where leaving it out would make the type of the lambda expression ambiguous.

Now, in Java, the variable sqr as defined here is not quite a function. It is a value of type FunctionR2R, which means that it contains a function named valueAt, as specified in the definition of interface FunctionR2R. The full name of that function is sqr.valueAt, and we must use that name to call the function. For example: sqr.valueAt(42) or sqr.valueAt(x) + sqr.valueAt(y).

When a lambda expression has two parameters, the parentheses are not optional. Here is an example of using the ArrayProcessor interface, which also demonstrates a lambda expression with a multiline definition:

ArrayProcessor concat;
concat = (A,n) -> { // parentheses around (A,n) are required!
    String str;
    str = "";
    for (int i = 0; i < n; i++)
        str += A[i];
    System.out.println(str);
};  // The semicolon marks the end of the assignment statement;
    //      it is not part of the lambda expression.

String[] nums;
nums = new String[4];
nums[0] = "One";
nums[1] = "Two";
nums[2] = "Three";
nums[3] = "Four";
for (int i = 1; i < nums.length; i++) {
   concat.process( nums, i );
}

This will print out

One
OneTwo
OneTwoThree
OneTwoThreeFour

Things get more interesting when a lambda expression is used as an actual parameter, which is the most common use in practice. For example, suppose that the following function is defined:

/** 
 *  For a function f, compute f(start) + f(start+1) + ... + f(end).
 *  The value of end should be >= the value of start.
 */
static double sum( FunctionR2R f, int start, int end ) {
    double total = 0;
    for (int n = start; n <= end; n++) {
        total = total + f.valueAt( n );
    }
    return total;
}

Note that since f is a value of type FunctionR2R, the value of f at n is actually written as f.valueAt(n). When the function sum is called, the first parameter can be given as a lambda expression that matches the type FunctionR2R. For example:

System.out.print("The sum of n squared for n from 1 to 100 is ");
System.out.println( sum( x -> x*x, 1, 100 ) );
System.out.print("The sum of 2 raised to the power n, for n from 1 to 10 is ");
System.out.println( sum( num -> Math.pow(2,num), 1, 10 ) );

As another example, suppose that we have a subroutine that performs a given task several times. The task can be specified as a value of type Runnable:

static void doSeveralTimes( Runnable task, int repCount ) {
    for (int i = 0; i < repCount; i++) {
        task.run();  // Perform the task!
    }
}

We could then say "Hello World" ten times by calling

doSeveralTimes( () -> System.out.println("Hello World"), 10 );

Note that for a lambda expression of type Runnable, the parameter list is given as an empty pair of parentheses. Here is an example in which the syntax is getting rather complicated:

doSeveralTimes( () -> { 
        // count from 1 up to some random number between 5 and 25
    int count = 5 + (int)(21*Math.random()); 
    for (int i = 1; i <= count; i++) { 
        System.out.print(i + " ");
    }
    System.out.println();
}, 100);

This is a single subroutine call statement in which the first parameter is a lambda expression that extends over multiple lines. The second parameter is 100, and the semicolon on the last line ends the subroutine call statement.

We have seen examples of assigning a lambda expression to a variable and of using one as an actual parameter. Here is an example in which a lambda expression is the return value of a function:

static FunctionR2R makePowerFunction( int n ) {
   return x -> Math.pow(x,n);
}

Then makePowerFunction(2) returns a FunctionR2R that computes the square of its parameter, while makePowerFunction(10) returns a FunctionR2R that computes the 10-th power of its parameter. This example also illustrates the fact that a lambda expression can use other variables in addition to its parameter, such as n in this case (although there are some restrictions on when that can be done).

4.5.4 Method References

Suppose that we want a lambda expression to represent the square root function as a value of type FunctionR2R. We could write it as x -> Math.sqrt(x). However, this lambda expression is a simple wrapper for a Math.sqrt function that already exists. Instead of writing out the lambda expression, that function can be written as a method reference, which takes the form Math::sqrt. (Recall that in Java, "method" is another word for "subroutine.") This method reference is just a shorthand for the lambda expression, and it can be used wherever that lambda expression could be used, such as in the sum function defined above:

System.out.print("The sum of the square root of n for n from 1 to 100 is ");
System.out.println( sum( Math::sqrt, 1, 100 ) );

It would be nice if we could simply use the name Math.sqrt here instead of introducing a new notation with ::, but the notation Math.sqrt was already defined to mean a variable named sqrt in the Math class.

More generally, a lambda expression that simply calls an existing static method can be written as a method reference of the form

classname :: method-name

Furthermore, this notation extends to methods that are in objects rather than classes. For example, if str is a String, then str contains the method str.length(). The method reference str::length could be used as a lambda expression of type SupplyInt, where SupplyInt is the functional interface

public interface SupplyInt {
    int get( );
}