Casting, transforming, and forcing data types

The concepts of casting, conversion, and coercion can be easily mixed up, especially when talking about them across different programming languages. C# adds to this confusion with its blend of casting and conversion. This is evident in the numerous questions on Stack Overflow [[1]], [[2]]…

My understanding of this topic has evolved thanks to recent research. I previously saw casting (both upcasting and downcasting) as a way for developers to guide the compiler. Upcasting was like saying, “Hey, even though I’m a Dog object, treat me as an Animal and call the Animal version of this method.” Downcasting was more like, “Hey, even though you think I’m just an Animal, I’m actually a Dog. Check to make sure at runtime, and if I’m right, you can call Dog-specific methods. If I’m wrong, throw an error.” In C#, this translates to the compiler generating a ‘castclass’ operation in the IL code, which essentially performs a type check.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
using System;

class Animal
{
 public void DoNoise()
 {
  Console.WriteLine("Animal.DoNoise");
 }
 public virtual void Move()
 {
  Console.WriteLine("Animal.Move");
 }
}

class Dog: Animal
{
 public new void DoNoise()
 {
  Console.WriteLine("Dog.DoNoise");
 }
 
 public override void Move()
 {
  Console.WriteLine("Dog.Move");
 }
 
 public void Bark()
 {
  Console.WriteLine("Dog.Bark");
 }
}

class App
{
 public static void Main()
 {
  Dog d1 = new Dog();
  ((Animal)d1).DoNoise();
  //upcasting and non virtual method, prints: Animal.DoNoise

((Animal)d1).Move();
  //prints: Dog.Move
  //as this is a virtual method we have dynamic binding here, so the casting has no effect
  //and it still calls Dog.Move

Animal a1 = d1;
  ((Dog)a1).Bark();
//prints: Dog.Bark
  //compiler adds a runtime check, so at runtime a1 is verified to be a Dog, and Bark is invoked
 }
}

To me, this was completely separate from “conversion.” I thought casting was just about instructing the compiler to handle an object as a different type. However, C# complicates things by using the same casting operator for both this hinting mechanism and actual data type conversions. Note that C# has [defines a set] for these explicit conversions that we can [define our own implicit (no cast used) or explicit (using the cast syntax) conversions]. This can lead to code that’s unexpected if you’re used to casting in the context of subtypes:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
class Table
{
    static public explicit operator Dog(Table t)
    {
       return new Dog();
    }
}

....
Dog d = (Dog)new Table();
...

Eric Lippert explains this duality very well on Stack Overflow and his blog:

A “cast” uses the cast operator to tell the compiler one of two things: either (1) this expression’s type is unknown, but trust me, it’ll be of this specific type at runtime (the compiler should treat it as that type, and if it’s wrong at runtime, we’ll get an error), or (2) this expression has a completely different type, but there’s a standard way to connect instances of its current type to instances of the target type. The compiler should then generate code to make that conversion happen. It’s interesting to note that these are opposites, which I find quite clever.

  • Your code might have an expression of type B, but you have knowledge the compiler doesn’t. You’re confident that during execution, this object of type B will always be of a derived type D. By inserting a cast to D, you inform the compiler of this assumption. The compiler might double-check at runtime by adding a check, as it likely can’t verify your claim. An exception occurs if you were wrong.
  • You have an expression of type T that you know is definitely not of type U. However, you have a way to link some or all values of T to a corresponding value of U. You can tell the compiler to generate code for this conversion by adding a cast to U. Again, an exception is thrown if no matching U value exists for the T value at runtime.

So, a cast can trigger a conversion (compiler adds code), do nothing (upcasting), or add a type check (‘castclass’ in IL).

My earlier explanation only covers the first point. As Eric points out, it’s better to see casting as syntax with two different meanings. While he calls it a “neat trick,” it can definitely be confusing. If I want to convert an object, I’d rather be explicit and use something like Convert.DoWhateverConversion.

People seem to use “conversion” for both cases: when we’re guiding the compiler (identity conversion) and when an object’s data is transformed (like converting a double to an int).

A “conversion” is when a value of one type is handled as a value of another (often, but not always, different). An “identity conversion” is still technically a conversion. This can involve changing how the data is represented (like int to double) or keeping the representation the same (like string to object). Conversions can be “implicit” (no cast needed) or “explicit” (require a cast).

This clarifies the difference between cast and as in C#. As stated in [explained here], as only deals with the first part of a cast, not the second (so “conversion” here refers to “identity conversions”).

The “as” operator only works with reference, boxing, and unboxing conversions.

Another relevant concept, often encountered in JavaScript, is coercion. I’ve read a clear definition: “Coercion is an implicit conversion that changes how data is represented.” [This post] explains JavaScript coercion well. This means we could call [C#’s implicit conversions] coercions. Keep in mind that these involve changing the underlying data, not just hinting to the compiler.

1
2
3
4
5
6
short s = 5;
s.GetType();
//System.Int16, so this representation now takes up 2 bytes
int i = s;
i.GetType();
//System.Int32, so this representation now takes up 2 bytes

To learn more, I recommend these posts: [[1]] and [[2]].

Licensed under CC BY-NC-SA 4.0
Last updated on Aug 15, 2023 20:17 +0100