Issue 121221.1: Allow DW_AT_type with DW_TAG_string_type
Section 5.9, pg 98
The purpose of this write up is to replace the existing proposal 120213.1 by Tobias Burnus:
"Allow DW_AT_type with DW_TAG_string_type"
Section 5.9 String Type Entries, pg 98
The DWARF spec currently does not allow to specify the type (DW_AT_type) for strings (DW_TAG_string_type).
That's a problem for Fortran, where different string types can exists. For instance, GCC's gfortran [since
4.4] has a kind=1 (one byte) and a kind=4 (four bytes, for UCS-4) character type. The NAG compiler (nagfor
5.3) supports besides the one-byte type also UCS-2, UCS-4, and Shift-JIS.
Note: UCS-4 is an (optional) feature of the Fortran 2003 standard (ISO/IEC 1539-1:2004; see also Fortran
2008, ISO/IEC 1539-1:2010).
[Information from John Bishop]
From the document with this title page:
WD 1539-1 J3/10-007r1 (F2008 Working Document) 24th November 2010 16:43
Section 4.4.3 Character type; subsection 4.4.3.1 Character sets, paragraphy 3:
[quote]
The character set specified in ISO/IEC 646:1991 (International Reference Version) is referred to as the
ASCII character set and its corresponding representation method is ASCII character kind. The character
set UCS-4 as specified in ISO/IEC 10646 is referred to as the ISO 10646 character set and its corresponding
method is the ISO 10646 character kind.
[end quote]
Section 4.4.3.2 says that there is no standard list of character "kinds" but every kind provided by the
"processor" (the combined hardware/software system from a vendor) must have a blank character.
Section 13.7.145 ("selected_char_kind(name)") says that the defined kinds are DEFAULT, ASCII and ISO_10646
and that processors may support other kinds.
[end Information from John Bishop]
Proposal:
A string type entry may have a DW_AT_type attribute describing how each character is encoded and is to be
interepreted. The value of this attribute is a base type entry represented by a debugging information entry
with the tag DW_TAG_base_type. If the attribute is absent, then the character is encoded using the system
default.
Add new encoding value:
DW_ATE_UCS Unicode character (fixed size per character)
DW_ATE_ascii ASCII character
The DW_ATE_UCS encoding is intended for Unicode character encoding (see ISO/IEC 10646). For example, UCS-4
(ISO 10646 character set) is represented by a base type entry with an encoding attribute whose value is
DW_ATE_UCS and a byte size attribute whose value is 4.
The DW_ATE_ascii encoding is intended for single byte ASCII character encoding (see ISO/IEC 646:1991, ASCII
character set).
Fortran 2003 example:
program character_kind
use iso_fortran_env
implicit none
integer, parameter :: ascii = selected_char_kind ("ascii")
integer, parameter :: ucs4 = selected_char_kind ('ISO_10646')
character(kind=ascii, len=26) :: alphabet
character(kind=ucs4, len=30) :: hello_world
alphabet = ascii_"abcdefghijklmnopqrstuvwxyz"
hello_world = ucs4_'Hello World and Ni Hao -- ' &
// char (int (z'4F60'), ucs4) &
// char (int (z'597D'), ucs4)
write (*,*) alphabet
open (output_unit, encoding='UTF-8')
write (*,*) trim (hello_world)
end program character_kind
Proposed DWARF output:
$1: DW_TAG_base_type
DW_AT_encoding (DW_ATE_ascii)
$2: DW_TAG_base_type
DW_AT_encoding (DW_ATE_UCS)
DW_AT_byte_size (4)
$3: DW_TAG_string_type
DW_AT_type ($1)
DW_AT_string_length ( ... )
DW_AT_string_length_byte_size ( ... )
DW_AT_data_location ( ... )
$4: DW_TAG_string_type
DW_AT_type ($2)
DW_AT_string_length ( ... )
DW_AT_string_length_byte_size ( ... )
DW_AT_data_location ( ... )
$5: DW_TAG_variable
DW_AT_name (alphabet)
DW_AT_type ($3)
DW_AT_location ( ... )
$6: DW_TAG_variable
DW_AT_name (hello_world)
DW_AT_type ($4)
DW_AT_location ( ... )