What's new for V8.0

The new features and enhancements in XL C/C++ Enterprise Edition V8.0 for AIX fall into four categories:

Performance and optimization

Many new features and enhancements fall into the category of optimization and performance tuning.

Architecture and processor-specific code tuning

The -qarch compiler option controls the particular instructions that are generated for the specified machine architecture. The -qtune compiler option adjusts the instructions, scheduling, and other optimizations to enhance performance on the specified hardware. These options work together to generate application code that gives the best performance for the specified architecture.

XL C/C++ V8.0 augments the list of suboptions available to the -qarch compiler option to support newly-available POWER5+ processors and processors that support the VMX instruction set. The following new -qarch options are available:

High performance libraries

XL C/C++ includes highly-tuned mathematical functions that can greatly improve the performance of mathematically-intensive applications. These functions are provided through the following high-performance libraries:

Mathematical Acceleration Subsystem (MASS)
MASS libraries provide high-performance scalar and vector functions to perform common mathematical computations. The MASS libraries included with XL C/C++ Enterprise Edition V8.0 for AIX introduce new scalar and vector functions, and new support for the POWER5 processor architecture.

For more information about using the MASS libraries, see Using the Mathematical Acceleration Subsystem.

Basic Linear Algebra Subprograms (BLAS)
XL C/C++ Enterprise Edition V8.0 for AIX introduces the BLAS set of high-performance algebraic functions. You can use these functions to:

For more information about using the BLAS functions, see Using the Basic Linear Algebra Subprograms.

VMX support

XL C/C++ now supports vector multimedia extension (VMX) instructions and the AltiVec programming model.

Objects compiled with vector data types and related operations can run on systems with processor architectures and operating systems (AIX 5L Version 5.3 with the 5300-03 Recommended Maintenance package or higher) that support the single instruction, multiple data (SIMD) instruction set. The SIMD instruction set (also known as vector multimedia extension or VMX instructions) enables higher utilization of microprocessor hardware and supports performing calculations in parallel. The compiler provides the ability to automatically enable SIMD vectorization at higher levels of optimization.

This release of XL C/C++ introduces several new option and suboption combinations to enable and exploit VMX instructions.

Table 2. VMX-Related Compiler Options and Directives
Option/directive Description
-qenablevmx | -qnoenablevmx Setting -qenablevmx enables compiler generation of VMX instructions.

Setting -qnoenablevmx disables compiler generation of VMX instructions. This is the compiler default setting.

Note:
You can set -qenablevmx (and other options described in this section that cause the compiler to generate VMX instructions) if you are compiling your application on a system that does not support VMX instructions, but are targeting your compiled objects for later use on a system that does support VMX instructions.
-qaltivec | -qnoaltivec Setting -qaltivec instructs the compiler to support programming with vector data types and operators. This option has effect only when -qenablevmx is also in effect.

Setting -qnoaltivec disables vector support.

-qhot=simd | -qhot=nosimd When -qhot=simd is in effect, the compiler will try to improve application performance by converting certain loop operations on successive elements in an array into calls to the faster, more efficient VMX instructions.. This option has effect only when the target architecture supports VMX instructions and -qenablevmx is set.

When -qhot=nosimd is in effect, the compiler performs optimizations on loops and arrays, but does not replace code with calls to VMX instructions.

-qvecnvol | -qnovecnvol -qvecnvol instructs the compiler to generate objects that use both volatile and non-volatile vector registers, providing potential performance benefits on systems that support VMX instructions.

-qnovecnvol instructs the compiler to generate objects that use only volatile vector registers. Volatile vector registers do not preserve their values across function calls or context save/jump/switch system library functions. Setting this option will make your vector applications safe where there is risk of interaction with objects built with AIX libraries prior to AIX 5.3 with 5300-03, but may also result in reduced application performance. This is the compiler default setting.

Other performance-related compiler options and directives

The entries in the following table describes new or changed compiler options and directives not already mentioned in the sections above.

Information presented here is just a brief overview. For more information about these compiler options, refer to Options for performance optimization.

Table 3. Other Performance-Related Compiler Options and Directives
Option/directive Description
-qhot -qhot adds the following new suboptions:
-qhot=level=0
The compiler performs a subset of high-order transformations.
  • -qhot=novector
  • -qhot=nosimd
  • -qhot=noarraypad
This setting is the default when -O3 optimization is in effect.
-qhot=level=1
The compiler performs the complete range of high-order transformations.
  • -qhot=vector
  • -qhot=simd
  • -qhot=arraypad
This setting is the default when -O4 or -O5 optimization is in effect.
-qhot=simd
Described above in VMX support.
-qipa -qipa adds the following new suboptions:
-qipa=clonearch=arch{,arch}
Specifies one or more processor architectures for which multiple versions of the same instruction set are produced.

XL C/C++ lets you specify multiple specific processor architectures for which instruction sets will be generated. At run time, the application will detect the specific architecture of the operating environment and select the instruction set specialized for that architecture.

-qipa=cloneproc=name{,name}
Specifies the names of one or more functions to clone for the processor architectures specified by the clonearch suboption.
-qipa=malloc16
This new option has effect only at link time. It asserts to the compiler that dynamic memory allocation routines such as malloc, calloc, realloc, and new will return addresses aligned on 16-byte boundaries, and instructs the compiler to optimize generated code according to that assertion. This option is set by default when compiling in 64-bit mode, but can be overridden with -qipa=nomalloc16.
Notes:
  1. You must specify -qipa=nomalloc16 only if you can ensure that executables created with this option will be run in an environment where dynamic memory allocations can return addresses aligned on 16-byte boundaries.
  2. If you are using -qhot=simd, you should also consider specifying -qipa=malloc16 to expose additional VMX optimization opportunities.
-O Specifying the -O3 compiler option now instructs the compiler to also assume the -qhot=level=0 compiler option setting.

Specifying the -O4 or -O5 compiler option now instructs the compiler to also assume the -qhot=level=1 compiler option setting.

Built-in functions new for this release

The following table lists built-in functions that are new for this release. For more information on built-in functions provided by XL C/C++, see Built-in functions for POWER and PowerPC architectures.

Table 4. Built-in functions for XL C/C++
Function Description
void __builtin_return_address (unsigned int level); Returns the return address of the current function, or of one of its callers where level is a constant literal indicating the number of frames to scan up the call stack.
void __builtin_frame_address (unsigned int level); Returns the address of the function frame of the current function, or of one of its callers where level is a constant literal indicating the number of frames to scan up the call stack
int __compare_and_swap(volatile int* addr, int* old_val_addr, int new_val); Performs an atomic operation which compares the contents of a single word variable with a stored old value.
int __compare_and_swaplp(volatile long* addr, long* old_val_addr, long new_val); Performs an atomic operation which compares the contents of a double word variable with a stored old value.
int __fetch_and_add(volatile int* addr, int val); Increments the single word specified by addr by the amount specified by val in a single atomic operation.
long __fetch_and_addlp(volatile long* addr, long val); Increments the double word specified by addr by the amount specified by val in a single atomic operation.
unsigned int __fetch_and_and(volatile unsigned int* addr, unsigned int val); Clears bits in the single word specified by addr by AND-ing that value with the input val parameter, in a single atomic operation.
unsigned long __fetch_and_andlp(volatile unsigned long* addr, unsigned long val); Clears bits in the double word specified by addr by AND-ing that value with the input val parameter, in a single atomic operation.
unsigned int __fetch_and_or(volatile unsigned int* addr, unsigned int val); Sets bits in the single word specified by addr by OR-ing that value with the input val parameter, in a single atomic operation.
unsigned long __fetch_and_orlp(volatile unsigned long* addr, unsigned long val); Sets bits in the double word specified by addr by OR-ing that value with the input val parameter, in a single atomic operation.
unsigned int __fetch_and_swap(volatile unsigned int* addr, unsigned int val); Sets the single word specified by addr to the value or the input val parameter and returns the original contents of the memory location, in a single atomic operation.
double __frim(double val); Takes an input val in double format, rounds val down to the next lower integral value, and returns the result in double format. Valid only for POWER5+ processors.
float __frims(float val); Takes an input val in float format, rounds val down to the next lower integral value, and returns the result in float format. Valid only for POWER5+ processors.
double __frin(double val); Takes an input val in double format, rounds val to the nearest integral value, and returns the result in double format. Valid only for POWER5+ processors.
float __frins(float val); Takes an input val in float format, rounds val to the nearest integral value, and returns the result in float format. Valid only for POWER5+ processors.
double __frip(double val); Takes an input val in double format, rounds val up to the next higher integral value, and returns the result in double format. Valid only for POWER5+ processors.
float __frips(float val); Takes an input val in float format, rounds val up to the next higher integral value, and returns the result in float format. Valid only for POWER5+ processors.
double __friz(double val); Takes an input val in double format, rounds val to the next integral value closest to zero, and returns the result in double format. Valid only for POWER5+ processors.
float __frizs(float val); Takes an input val in float format, rounds val to the next integral value closest to zero, and returns the result in float format. Valid only for POWER5+ processors.
long __ldarx(volatile long* addr); Generates a Load Double Word And Reserve Indexed (ldarx) instruction. This instruction can be used in conjunction with a subsequent stwcx. instruction to implement a read-modify-write on a specified memory location.
int __lwarx(volatile int* addr); Generates a Load Word And Reserve Indexed (lwarx) instruction. This instruction can be used in conjunction with a subsequent stwcx. instruction to implement a read-modify-write on a specified memory location.
int __stdcx(volatile long* addr, long val); Generates a Store Double Word Conditional Indexed (stdcx.) instruction. This instruction can be used in conjunction with a preceding ldarx instruction to implement a read-modify-write on a specified memory location.
int __stwcx(volatile int* addr, int val); Generates a Store Word Conditional Indexed (stwcx.) instruction. This instruction can be used in conjunction with a preceding lwarx instruction to implement a read-modify-write on a specified memory location.
unsigned long __mftb(); Generates a Move From Time Base (mftb) hardware instruction.
unsigned int __mftbu(); Generates a Move From Time Base Upper (mftbu) hardware instruction.

Related Information

Support for language enhancements and APIs

API and language enhancements can offer you additional ease of use and flexibility when developing your applications, as well as making it easier for you to develop code that more fully exploits the capabilities of your hardware platform.

New type traits library

This release of XL C/C++ introduces the type_traits library. This library is based on the proposed type traits implementation as submitted to the ISO C++ Standard committee for addition into the ISO C++ Standard, and as approved for inclusion in The Technical Report 1 (TR1) on C++ Library Extensions.

For more information about the type traits proposal, see:

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2003/n1424.htm

OpenMP API V2.5 support for C, C++, and Fortran

XL C/C++ now supports the OpenMP API V2.5 standard. This latest level of the OpenMP specification combines the previous C/C++ and Fortran OpenMP specifications into one single specification for both C/C++ and Fortran, and resolves previous inconsistencies between them.

The OpenMP Application Program Interface (API) is a portable, scalable programming model that provides a standard interface for developing user-directed shared-memory parallelization in C, C++, and Fortran applications. The specification is defined by the OpenMP organization, a group of computer hardware and software vendors, including IBM.

You can find more information about OpenMP specifications at:

www.openmp.org

VMX APIs

XL C/C++ now supports the AltiVec programming model and APIs. For more information about vector data types and operations, see:

You can also find more information about the AltiVec programming model and specifications at:

www.freescale.com

Related Information

Ease of use

XL C/C++ includes the following new features to help you more easily use the compiler for your application development.

IBM Debugger for AIX

XL C/C++ Enterprise Edition V8.0 for AIX includes the IBM Debugger for AIX to help you detect and diagnose errors in compiled programs that are running locally or remotely. You can monitor variables, expressions, registers, memory, and application modules of the application you are debugging.

Support for IBM Tivoli License Manager

IBM Tivoli License Manager (ITLM) is a Web-based solution that can help you manage software usage metering and license allocation services on supported systems. In general, ITLM recognizes and monitors the products that are installed and in use on your system.

IBM XL C/C++ Enterprise Edition V8.0 for AIX is ITLM-enabled for inventory support only, which means that ITLM is able to detect product installation of XL C/C++, but not its usage.

Note:
ITLM is not a part of the XL C/C++ compiler offering, and must be purchased and installed separately.

Once installed and activated, ITLM scans your system for product inventory signatures that indicate whether a given product is installed on your system. ITLM also identifies that product's version, release, and modification levels. Signature files for XL C/C++ are installed to the following directory:

Default installations
/usr/vac
Non-default installations
compiler/usr/vac where compiler is the target directory for installation specified by the -b installation option.

For more information about IBM Tivoli License Manager Web, see:

www.ibm.com/software/tivoli/products/license-mgr

Related Information

New compiler options

Compiler options can be specified on the command line or through directives embedded in your application source files. The following table describes new compiler options or suboptions not already described elsewhere in this section.

New command line options

The following table summarizes command line options new to XL C/C++. You can find detailed syntax and usage information for all compiler options in Compiler options reference.

Option Description and remarks
-qasm The -qasm compiler option now adds new functionality. You can now not only use this compiler option to control how inline assembler statements in your program are interpreted, but you can also control whether or not code is emitted for the asm statement.
-qasm_as The syntax of the -qasm_as compiler option has changed slightly.
-qlist The -qlist compiler option adds new offset and nooffset suboptions. Specifying -qlist=offset instructs the compiler to show object listing offsets from the start of a procedure rather than from the start of code generation.
-qmakedep The -qmakedep compiler option adds a new gcc suboption. Specifying -qmakedep=gcc instructs the compiler to generate make dependency information in a format similar to that used by the GNU C/C++ compiler.
-MF This new compiler option specifies a filename for the make dependency file generated by the -qmakedep or -M option.
-qppline This new compiler option enables generation of #line directives in preprocessed output. The -qnoppline compiler option disables generation of #line directives.
-qreserved_reg This new compiler option lets you reserve one or more register names. A reserved register cannot be used during compilation except as a stack pointer, frame pointer or in some other fixed role.
-qsourcetype This release adds assembler-with-cpp as a new suboption to the -qsourcetype compiler option.

Ordinarily, the compiler recognizes assembler source files that require preprocessing by the file's .S filename suffix. The compiler preprocesses .S source files and then sends the preprocessor output to the assembler.

Specifying -qsourcetype=assembler-with-cpp filename on the command line instructs the compiler to treat all filenames appearing after the assembler-with-cpp, regardless of filename suffix, as being assembler source files requiring preprocessing.

-qtmplinst This new compiler option manages how the compiler performs implicit instantiations of templates.
-qversion Specifying the -qversion compiler option returns the official compiler product name and version.

New pragma directives

The following table summarizes pragma directive options new to XL C/C++. You can find detailed syntax and usage information in XL C/C++ Pragmas.

#pragma Directive Description and remarks
altivec_vrsave When the altivec_vrsave directive is in effect, function prologs and epilogs include code to maintain the VRSAVE register. This pragma has effect only when -qaltivec is in effect, must be used only within a function, and affects only the function in which it appears.
nosimd The nosimd directive prohibits the compiler from automatically generating vector multimedia extension (VMX) instructions in the loop immediately following the directive.
novector The novector directive prohibits the compiler from auto-vectorizing the loop immediately following the directive. Auto-vectorization refers to converting certain operations performed in a loop and on successive array elements, into a call to a routine that computes several results simultaneously.
STDC CX_LIMITED_RANGE The STDC CX_LIMITED_RANGE directive instructs the compiler that within the scope it controls, complex division and absolute value are only invoked with values such that intermediate calculation will not overflow or lose significance.

Related Information