In .NET, an easy way for you to simplify querying datasets is LINQ. Java doesn’t have this, but since the introduction of Java 8 in 2014, you do now have the possibility to use "streams". Both LINQ and streams are ways to simplify querying datasets and to provide developers with an easy API to use. Streams are similar to LINQ in many ways, but there are still some differences. Because Java only started using streams in 2014, the way in which they apply it to simplify querying sets of data can seem a little bit half-heartedly to a .NET developer (since LINQ was already introduced in 2008).
Nonetheless, it is interesting to take a look at the differences and similarities between both ways of querying data sources. Though LINQ in itself is defined both as a language construct and as a library, we will limit ourselves to the library part since it bears the most similarities to Java streams. We will also focus on LINQ to Objects only.
1. LINQ and Java stream operations
All LINQ and Java stream operations are part of one of three groups. These are:
- Fetch the data
- Create a query
- Execute the query
The following will give a more detailed explanation of each of these three groups.
Fetch the data
First, we need to fetch the data that we are going to manipulate. The source of the data doesn’t matter, what is important is that the resultset that we are going to work on explicitly implements the
IEnumerable<T> interface (for LINQ) or the
Collection<T> interface (for Java) -- either directly or via its parents. This way we know that we are working with a collection that can be manipulated.
Create a query
After you have fetched the data that you want to manipulate, you can write a query using that dataset. A query is just that: A list of criteria that specifies what subset of the data you want to retrieve. There is one major difference between Java and .NET related to the querying of data. In .NET, there is a specific difficulty called
DeferredExecution. This means that, if you call the same query ten times, it will be executed ten times. By contrast, if you do the same in Java, the JVM will throw an
This decision was made when the engineers for Java 8 were designing the implementation of the streaming-model, while working on JSR-335: The designers have chosen to throw an exception whenever you reuse an already closed stream. The .NET-team, in contrast, did not choose to implement this and put the entire responsibility with the developer who uses it.
Execute the query
You probably also want to do something with the subset of data you just retrieved: You can retrieve the objects in full or transform them using
reduce functionality. We should always make sure that side effects are either intentional or avoided. Both Java and .NET will happily introduce you to side effects on the objects themselves, but will throw exceptions when you try to modify the collection that you are currently handling.
persons.ForEach(x => x.Name = "C"); is OK, since it only affects the object inside the collection.
persons.ForEach(x => persons.Add(new Developer("C"))); will throw an
InvalidOperationException, because the collection itself was modified.
The following examples are some basic examples to give you a general idea of what you can do with the LINQ/Stream-functionality. Because, let’s be honest, code does speak louder than words.
In .NET (always the first example),
dataset is the “fetch the data”-part: In our case a static array of integers. The “create a query”-part of the statement is the where-clause: Here we say that we only want to consider the elements that are greater than 5. Finally, we execute the query by calling
sum(); this tells the application that we want to take the sum of every element that comes out of the query-part (in our case 6+7+ 8+9+10 = 40).
In Java (always the second one), we can see similarities:
data is the fetched data, the query-part is both
mapToInt. We call
sum() to execute the query and get the same result. Notice how, in this case, we create two intermediary streams: One contains the results of the
filter operation and the second one contains the same elements, but is returned as an
Some of the methods in .NET are not available in Java. We will briefly explore two of these methods.
First, .NET provides the developer with two distinct functions to filter the initial dataset: The
The difference between
Where() is that
TakeWhile() stops as soon as the condition returns false. The
Where(), by contrast, doesn’t, as you can see below:
This is relevant, since a malformed query can cause the application to have performance issues, like when we have a huge dataset that is streamed on-demand or a dataset that is potentially infinite, e.g. a stream of prime numbers. At the moment, this method is not possible in Java, but it will be supported in JDK 9.
A second example is
OfType(). Java lacks functions like this, but you can work around them by using a combination of the methods provided for the
This returns a collection of developers. In this case, the subset contains “Maarten” and “Robin” while leaving out “Frank” and “Hans”.
This returns a list with the same contents as the array. We can test the difference by using these two checks:
Both streams and LINQ support parallel processing, the former using
.parallelStream() and the latter using
.asParallel(). .NET supports this from .NET 4.0 onwards with the “PLINQ” execution engine. Mostly they do what you think they do, namely process data in parallel, but .NET has one pitfall compared to Java: There is no guaranteed order in which the statements are executed, unless you use the keyword
When you use behavioural parameters that are stateless and non-interfering, Java guarantees that the computation is the same, whether you run it in parallel or sequentially. As always, it’s important to make sure that the processing you do doesn’t introduce any unknown side effects.
Both technologies can be used to speed up and simplify the development process. One should, however, pay attention to the details mentioned in the text above because unoptimized code can lead to both faulty and poorly performant code. Especially consider the “hidden” functionalities in .NET, like "DeferredExecution", and possible side effects during the use of parallel processing.
More information on LINQ can be found, as with all things .NET, on the documentation part of the MSDN site. For more information on Java streams, see the API documentation at the Oracle site. If there's another topic you'd like to see supported in comparing Java to .NET, do leave a comment.