Comparison of Java and .Net: String.equals and interning

As a developer proficient in C# and JavaScript, I encounter a recurring puzzle when I come across Java code: the use of String.equals(). While the rationale behind it is clear—a fundamental concept for any Java programmer—it still strikes me as peculiar. Even more peculiar is the knowledge that using “==” for string comparison in Java can yield different results based on implementation specifics like string interning.

Let’s delve into this. At its core, comparison can have two distinct meanings: identity/reference equality and value equality. This post puts it nicely:

  • Identity (reference equality): Two objects are considered identical if they are, in fact, the same object residing in memory. This means their references point to the same memory location.
  • Equivalence (value equality): Two objects are considered equivalent if they hold the same value or values.

When comparing value/primitive types in C#, Java, or JavaScript, the expectation is an equivalence comparison. Conversely, when comparing reference types (objects), the primary expectation is a reference comparison. However, certain factors complicate matters, namely, boxing (and caching), strings (and interning).

Strings, being objects/reference types (except in JavaScript, where they are primitive types), often demand value semantics during comparison. In essence, the concern is usually not whether two strings are the exact same instance in memory, but rather if they hold the same value. This is where Java and C# diverge in their approaches.

C# adheres to value semantics, employing value equality when comparing strings with “==”. This behavior stems from the overloading of the “==” operator specifically for the String class. Without this overload, “==” would default to reference comparison (as it does with other objects). Since operator overloading relies on static methods, resolution happens at compile time, leading to intriguing consequences, as illustrated in sharply explained by Eric Lippert.

1
2
3
4
5
6
7
//C# code:
object obj = "Int32";
string str1 = "Int32";
string str2 = typeof(int).Name;
Console.WriteLine(obj == str1); // true (Reference equality and both strings are interned)
Console.WriteLine(str1 == str2); // true (value equality)
Console.WriteLine(obj == str2); // false !? (Reference equality and the 2nd string is not interned)

Java, devoid of operator overloading (a feature I have mixed feelings about, not necessarily deeming its absence a drawback), lacks a straightforward way to justify different “==” behavior for strings compared to other objects. Consequently, str1 == str2 in Java performs a reference comparison, necessitating the use of String.equals() for value comparison. Interestingly, due to interning, “==” might sometimes appear to perform value comparison. For instance:

1
2
3
4
5
6
//Java code:
String s1 = "hi"; //literal string, so it's interned
String s2 = "hi"; //literal string, so it's interned
s1 == s2; //true
String s3 = new String("hi"); //no interning
s2 == s3; //false

Here, s1 and s2, being interned, point to the same location in the string pool, resulting in a true reference comparison. However, s3, not being interned, occupies a different memory space, leading to a false comparison. This concept is succinctly summarized in This answer in Stackoverflow:

== tests for reference equality.
.equals() tests for value equality.

Therefore, to ascertain if two strings hold the same value, using .equals() is recommended (except in cases where value equality guarantees object representation equality, such as with string interning).

Conversely, in C#, the interned status of a string has no bearing on the “==” comparison. Since it performs value equality, the actual memory locations of the strings, whether in the intern pool or elsewhere, are irrelevant. Interestingly, C#’s “==” (which ultimately invokes String.Equals()) begins with a reference check. This allows for immediate return with true if the strings are interned, potentially bypassing a more extensive character-by-character comparison. This aspect is highlighted in brilliantly explained here, where the author intriguingly posits interning as a processing optimization in addition to its memory optimization role, even suggesting the use of string.Intern() before switch statements.

The concept of boxing also comes into play when discussing equality. As expected, in both C# and JavaScript, boxing two integers (numbers) results in a reference comparison, leading to false in the following scenarios:

While I haven’t personally tested this in Java (lacking a readily available REPL and not inclined to write a full Program.java for this purpose), this discussion in StackOverflow presents a perplexing scenario. It suggests that due to caching, the result for small numbers would be true.

If the value p being boxed is true, false, a byte, a char in the range \u0000 to \u007f, or an int or short number between -128 and 127, then let r1 and r2 be the results of any two boxing conversions of p. It is always the case that r1 == r2.

Licensed under CC BY-NC-SA 4.0
Last updated on Mar 09, 2024 23:48 +0100