A number of CLR language features changed from Managed Extensions
for C++ to Visual C++ 2008.
The changes described in this section
are a sort of language miscellany. It includes a change in the
handling of string literals, a change in the overload resolution
between an ellipsis and the Param
attribute, the change of typeof to
typeid, a change in the calling of
constructor initializer lists, and the introduction of a new cast
notation, that of safe_cast.
String Literal
Discusses how the handling of string literals has changed.
Param Array and Ellipsis
Discusses how ParamArray is now
given precedence over the ellipsis (�)
for resolving function calls with varying numbers of arguments.
typeof Goes to T::typeid
Discusses how the typeof
operator has been supplanted by typeid.
Initializer Lists
Discusses changes in the calling order of initializer lists.
Cast Notation and Introduction of safe_cast<>
Discusses changes to cast notation and in particular the
introduction of safe_cast.
String Literal
The handling of string literals has changed from Managed Extensions
for C++ to Visual C++ 2008.
In the Managed Extensions for C++
language design, a managed string literal was indicated by prefacing
the string literal with an S. For example:
Copy Code
String *ps1 = "hello";
String *ps2 = S"goodbye";
The performance overhead between the two initializations turns
out to be non-trivial, as the following CIL representation
demonstrates as seen through ildasm:
That�s a remarkable savings for just remembering (or learning) to
prefix a literal string with an S. In the
new syntax, the handling of string literals is made transparent,
determined by the context of use. The S no
longer needs to be specified.
What about cases in which we need to explicitly direct the
compiler to one interpretation or another? In these cases, we apply
an explicit cast. For example:
Copy Code
f( safe_cast("ABC") );
Moreover, the string literal now matches a
String with a simple conversion rather than a standard
conversion. While this may not sound like much it changes the
resolution of overloaded function sets which include a
String and a const
char* as competing formal parameters. The resolution that
once resolved to a const char* instance is
now flagged as ambiguous. For example:
Copy Code
ref struct R {
void f(const char*);
void f(String^);
};
int main () {
R r;
// old syntax: f( const char* );
// new syntax: error: ambiguous
r.f("ABC");
}
Why is there a difference? Since more than one instance named
f exists within the program, this requires
the function overload resolution algorithm to be applied to the
call. The formal resolution of an overload function involves three
steps.
The collection of the candidate functions. The candidate
functions are those methods within the scope that lexically
match the name of the function being invoked. For example, since
f() is invoked through an instance of
R, all named functions
f that are not a member of
R (or of its base class hierarchy) are
not candidate functions. In our example, there are two candidate
functions. These are the two member functions of
R named f. A
call fails during this phase if the candidate function set is
empty.
The set of viable functions from among the candidate
functions. A viable function is one that can be invoked with the
arguments specified in the call, given the number of arguments
and their types. In our example, both candidate functions are
also viable functions. A call fails during this phase if the
viable function set is empty.
Select the function that represents the best match of the
call. This is done by ranking the conversions applied to
transform the arguments to the type of the viable function
parameters. This is relatively straight-forward with a single
parameter function; it becomes somewhat more complex when there
are multiple parameters. A call fails during this phase if there
is no best match. That is, if the conversions necessary to
transform the type of the actual argument to the type of the
formal parameter are equally good. The call is flagged as
ambiguous.
In Managed Extensions, the resolution of this call invoked the
const char* instance as the best match. In
the new syntax, the conversion necessary to match
"abc" to const char* and
String^ are now equivalent � that is,
equally good � and so the call is flagged as bad � that is, as
ambiguous.
This leads us to two questions:
What is the type of the actual argument,
"abc"?
What is the algorithm for determining when one type
conversion is better than another?
The type of the string literal "abc" is
const char[4] � remember, there is an
implicit null terminating character at the end of every string
literal.
The algorithm for determining when one type conversion is better
than another involves placing the possible type conversions in a
hierarchy. Here is my understanding of that hierarchy � all these
conversions, of course, are implicit. Using an explicit cast
notation overrides the hierarchy similar to the way parentheses
overrides the usual operator precedence of an expression.
An exact match is best. Surprisingly, for an argument to be
an exact match, it does not need to exactly match the parameter
type; it just needs to be close enough. This is the key to
understanding what is going on in this example, and how the
language has changed.
A promotion is better than a standard conversion. For
example, promoting a short int to an
int is better than converting an
int into a double.
A standard conversion is better than a boxing conversion.
For example, converting an int into a
double is better that boxing an
int into an Object.
A boxing conversion is better than an implicit user-defined
conversion. For example, boxing an int
into an Object is better than applying
a conversion operator of a SmallInt
value class.
An implicit user-defined conversion is better than no
conversion at all. An implicit user-defined conversion is the
last exit before Error (with the caveat that the formal
signature might contain a param array or ellipsis at that
position).
So, what does it mean to say that an exact match isn't
necessarily exactly a match? For example, const
char[4] does not exactly match either
const char* or String^, and yet the
ambiguity of our example is between two conflicting exact matches!
An exact match, as it happens, includes a number of trivial
conversions. There are four trivial conversions under ISO-C++ that
can be applied and still qualify as an exact match. Three are
referred to as lvalue transformations. A fourth type is called a
qualification conversion. The three lvalue transformations are
treated as a better exact match than one requiring a qualification
conversion.
One form of the lvalue transformation is the
native-array-to-pointer conversion. This is what is involved in
matching a const char[4] to
const char*. Therefore, the match of
f("abc") to f(const
char*) is an exact match. In the earlier incarnations of our
language, this was the best match, in fact.
For the compiler to flag the call as ambiguous, therefore,
requires that the conversion of a const char[4]
to a String^ also be an exact match
through a trivial conversion. This is the change that has been
introduced in the new language version. And this is why the call is
now flagged as ambiguous.
Param Array and Ellipsis
Precedence of the param array for resolving overloaded function
calls has changed from Managed Extensions for C++ to Visual C++
2008.
In both Managed Extensions and the new syntax, there is no
explicit support for the param array that C# and Visual Basic
support. Instead, one flags an ordinary array with an attribute, as
follows:
Copy Code
void Trace1( String* format, [ParamArray]Object* args[] );
void Trace2( String* format, Object* args[] );
While these both look the same, the ParamArray
attribute tags this for C# or other CLR languages as an array taking
a variable number of elements with each invocation. The change in
behavior in programs between Managed Extensions and the new syntax
is in the resolution of an overloaded function set in which one
instance declares an ellipsis and a second declares a
ParamArray attribute, as in the following
example provided by Artur Laksberg.
Copy Code
int foo(...); // 1
int foo( [ParamArray] Int32[] ); // 2
In Managed Extensions, the ellipsis was given precedence over the
attribute which is reasonable since the attribute is not a formal
aspect of the language. However, in the new syntax, the param array
is now supported directly within the language, and it is given
precedence over the ellipsis because it is more strongly typed.
Thus, in Managed Extensions, the call
Copy Code
foo( 1, 2 );
resolves to foo(�) while in the new
syntax, it resolves to the ParamArray
instance. On the off chance that your program behavior depends on
the invocation of the ellipsis instance over that of the
ParamArray, you will need to modify either
the signature or the call.
typeof Goes to T::typeid
The typeof operator used in Managed
Extensions for C++ has been supplanted by the
typeid keyword in Visual C++ 2008.
In Managed Extensions,
the __typeof() operator returns the
associated Type* object when passed the
name of a managed type. For example:
Copy Code
// Creates and initializes a new Array instance.
Array* myIntArray =
Array::CreateInstance( __typeof(Int32), 5 );
In the new syntax, __typeof has been
replaced by an additional form of typeid
that returns a Type^ when a managed type
is specified.
Copy Code
// Creates and initializes a new Array instance.
Array^ myIntArray =
Array::CreateInstance( Int32::typeid, 5 );
Initializer Lists
Initializer lists in constructors are now called
before the base class constructor.
Remarks
Prior to Visual C++ 2005, the base class
constructor was called before the
initializer list when compiling with Managed
Extensions for C++. Now, when compiling with
/clr, the initializer list is called
first.
Cast Notation and Introduction of safe_cast<>
The cast notation has changed from Managed Extensions for C++ to
Visual C++ 2008.
Modifying an existing structure is a different
and more difficult experience than crafting the initial structure.
There are fewer degrees of freedom, and the solution tends towards a
compromise between an ideal restructuring and what is practicable
given the existing structural dependencies.
Language extension is another example. Back in the early 1990s as
Object-Orienting programming became an important paradigm, the need
for a type-safe downcast facility in C++ became pressing.
Downcasting is the user-explicit conversion of a base-class pointer
or reference to a pointer or reference of a derived class.
Downcasting requires an explicit cast. The reason is that the actual
type of the base class pointer is an aspect of the runtime; the
compiler therefore cannot check it. Or, to rephrase that, a downcast
facility, just like a virtual function call, requires some form of
dynamic resolution. This raises two questions:
Why should a downcast be necessary in the Object-Oriented
paradigm? Isn�t the virtual function mechanism sufficient? That
is, why can�t one claim that any need for a downcast (or a cast
of any sort) is a design failure?
Why should support of a downcast be a problem in C++? After
all, it is not a problem in object-oriented languages such as
Smalltalk (or, subsequently, Java and C#)? What is it about C++
that makes supporting a downcast facility difficult?
A virtual function represents a type-dependent algorithm common
to a family of types. (We are not considering interfaces, which are
not supported in ISO-C++ but are available in CLR programming and
which represent an interesting design alternative). The design of
that family is typically represented by a class hierarchy in which
there is an abstract base class declaring the common interface (the
virtual functions) and a set of concrete derived classes which
represent the actual family types in the application domain.
A Light hierarchy in a Computer
Generated Imagery (CGI) application domain, for example, will have
common attributes such as color,
intensity, position,
on, off, and so
on. One can control several lights, by using the common interface
without worrying whether a particular light is a spotlight, a
directional light, a non-directional light (think of the sun), or
perhaps a barn-door light. In this case, downcasting to a particular
light-type to exercise its virtual interface is unnecessary. In a
production environment, however, speed is essential. One might
downcast and explicitly invoke each method if by doing so inline
execution of the calls can be performed instead of using the virtual
mechanism.
So, one reason to downcast in C++ is to suppress the virtual
mechanism in return for a significant gain in runtime performance.
(Note that the automation of this manual optimization is an active
area of research. However, it is more difficult to solve than
replacing the explicit use of the register
or inline keyword.)
A second reason to downcast falls out of the dual nature of
polymorphism. One way to think of polymorphism is being divided into
a passive and dynamic pair of forms.
A virtual invocation (and a downcast facility) represents dynamic
uses of polymorphism: one is performing an action based on the
actual type of the base class pointer at that particular instance in
the execution of the program.
Assigning a derived class object to its base class pointer,
however, is a passive form of polymorphism; it is using the
polymorphism as a transport mechanism. This is the main use of
Object, for example, in pre-generic CLR
programming. When used passively, the base class pointer chosen for
transport and storage typically offers an interface that is too
abstract. Object, for example, provides
roughly five methods through its interface; any more specific
behavior requires an explicit downcast. For example, if we want to
adjust the angle of our spotlight or its rate of fall off, we would
have to downcast explicitly. A virtual interface within a family of
sub-types cannot practicably be a superset of all the possible
methods of its many children, and so a downcast facility will always
be needed within an object-oriented language.
If a safe downcast facility is needed in an object-oriented
language, then why did it take C++ so long to add one? The problem
is in how to make the information as to the run-time type of the
pointer available. In the case of a virtual function, the run-time
information is set up in two parts by the compiler:
The class object contains an additional virtual table
pointer member (either at the beginning or end of the class
object; that�s has an interesting history in itself) that
addresses the appropriate virtual table. For example, a
spotlight object addresses a spotlight virtual table, a
directional light, a directional light virtual table, and so on
Each virtual function has an associated fixed slot in the
table, and the actual instance to invoke is represented by the
address stored within the table. For example, the virtual
Light destructor might be associated
with slot 0, Color with slot 1, and so
on. This is an efficient if inflexible strategy because it is
set up at compile-time and represents a minimal overhead.
The problem, then, is how to make the type information available
to the pointer without changing the size of C++ pointers, either by
adding a second address or by directly adding some sort of type
encoding. This would not be acceptable to those programmers (and
programs) that decide not to use the object-oriented paradigm �
which was still the predominant user community. Another possibility
was to introduce a special pointer for polymorphic class types, but
this would be confusing, and make it difficult to inter-mix the two,
particularly with issues of pointer arithmetic. It would also not be
acceptable to maintain a run-time table that associates each pointer
with its currently associated type, and dynamically updating it.
The problem then is a pair of user communities which have
different but legitimate programming aspirations. The solution has
to be a compromise between the two communities, allowing each not
only their aspiration but the ability to interoperate. This means
that the solutions offered by either side are likely to be
infeasible and the solution implemented finally to be less than
perfect. The actual resolution revolves around the definition of a
polymorphic class: a polymorphic class is one that contains a
virtual function. A polymorphic class supports a dynamic type-safe
downcast. This solves the maintain-the-pointer-as-address problem
because all polymorphic classes contain that additional pointer
member to their associated virtual table. The associated type
information, therefore, can be stored in an expanded virtual table
structure. The cost of the type-safe downcast is (almost) localized
to users of the facility.
The next issue with the type-safe downcast was its syntax.
Because it is a cast, the original proposal to the ISO-C++ committee
used the unadorned cast syntax, as in this example:
Copy Code
spot = ( SpotLight* ) plight;
but this was rejected by the committee because it did not allow
the user to control the cost of the cast. If the dynamic type-safe
downcast has the same syntax as the previously unsafe but static
cast notation, then it becomes a substitution, and the user has no
ability to suppress the runtime overhead when it is unnecessary and
perhaps too costly.
In general, in C++, there is always a mechanism by which to
suppress compiler-supported functionality. For example, we can turn
off the virtual mechanism by either using the class scope operator (Box::rotate(angle))
or by invoking the virtual method through a class object (rather
than a pointer or reference of that class). This latter suppression
is not required by the language but is a quality of implementation
issue, similar to the suppression of the construction of a temporary
in a declaration of the form:
Copy Code
// compilers are free to optimize away the temporary
X x = X::X( 10 );
So the proposal was taken back for further consideration, and
several alternative notations were considered, and the one brought
back to the committee was of the form (?type),
which indicated its undetermined � that is, dynamic nature. This
gave the user the ability to toggle between the two forms � static
or dynamic � but no one was too pleased with it. So it was back to
the drawing board. The third and successful notation is the now
standard dynamic_cast, which was
generalized to a set of four new-style cast notations.
In ISO-C++, dynamic_cast returns
0 when applied to an inappropriate pointer
type, and throws a std::bad_cast exception
when applied to a reference type. In Managed Extensions for C++,
applying dynamic_cast to a managed
reference type (because of its pointer representation) always
returned 0. __try_cast
was introduced as an analog to the exception throwing variant of the
dynamic_cast, except that it throws
System::InvalidCastException if the cast
fails.
Copy Code
public __gc class ItemVerb;
public __gc class ItemVerbCollection {
public:
ItemVerb *EnsureVerbArray() [] {
return __try_cast
(verbList->ToArray(__typeof(ItemVerb *)));
}
};
In the new syntax, __try_cast has been
recast as safe_cast. Here is the same code
fragment in the new syntax:
Copy Code
public ref class ItemVerb;
public ref class ItemVerbCollection {
public:
array^ EnsureVerbArray() {
return safe_cast^>
( verbList->ToArray( ItemVerb::typeid ));
}
};
In the managed world, it is important to allow for verifiable
code by limiting the ability of programmers to cast between types in
ways that leave the code unverifiable. This is a critical aspect of
the dynamic programming paradigm represented by the new syntax. For
this reason, instances of old-style casts are recast internally as
run-time casts, so that, for example:
Copy Code
// internally recast into the
// equivalent safe_cast expression above
( array^ ) verbList->ToArray( ItemVerb::typeid );
On the other hand, because polymorphism provides both an active
and a passive mode, it is sometimes necessary to perform a downcast
just to gain access to the non-virtual API of a subtype. This can
occur, for example, with the member(s) of a class that want to
address any type within the hierarchy (passive polymorphism as a
transport mechanism) but for which the actual instance within a
particular program context is known. In this case, having a run-time
check of the cast can be an unacceptable overhead. If the new syntax
is to serve as the managed systems programming language, it must
provide some means of allowing a compile-time (that is, static)
downcast. That is why the application of the
static_cast notation is allowed to remain a compile-time
downcast:
Copy Code
// ok: cast performed at compile-time.
// No run-time check for type correctness
static_cast< array^>(verbList->ToArray(ItemVerb::typeid));
The problem is that there is no way to guarantee that the
programmer doing the static_cast is
correct and well-intentioned; that is, there is no way to force
managed code to be verifiable. This is a more urgent concern under
the dynamic program paradigm than under native, but is not
sufficient within a system programming language to disallow the user
the ability to toggle between a static and run-time cast.
There is a performance trap and pitfall in the new syntax,
however. In native programming, there is no difference in
performance between the old-style cast notation and the new-style
static_cast notation. But in the new
syntax, the old-style cast notation is significantly more expensive
than the use of the new-style static_cast
notation. The reason is that the compiler internally transforms the
use of the old-style notation into a run-time check that throws an
exception. Moreover, it also changes the execution profile of the
code because it causes an uncaught exception bringing down the
application � perhaps wisely, but the same error would not cause
that exception if the static_cast notation
were used. One might argue this will help prod users into using the
new-style notation. But only when it fails; otherwise, it will cause
programs that use the old-style notation to run significantly slower
without a visible understanding of why, similar to the following C
programmer pitfalls:
Copy Code
// pitfall # 1:
// initialization can remove a temporary class object,
// assignment cannot
Matrix m;
m = another_matrix;
// pitfall # 2: declaration of class objects far from their use
Matrix m( 2000, 2000 ), n( 2000, 2000 );
if ( ! mumble ) return;