Issue 140724.1: Line File Table
This is a REPLACEMNT for Issue 140724.1 which was discussed at
the August 19 meeting. (That proposal itself was a follow on to
previously approved Issues 140323.1 and 140331.2 which were
reconsidered at the August meeting and rejected in favor of
considering 140724.1). This version reflects changes and consensus
from the August meeting.
The main changes in this version compared to the draft considered
before are:
- A new form DW_FORM_line_str is introduced, which is designed
for references into the .debug_line_str section.
- Clarify that there is no .debug_line_str.dwo section to be used
in a .dwo file; rather string references in the .debug_line
section should use form DW_FORM_strp or DW_FORM_strx.
- Clean up latent file stripping problems pointed out by Cary.
- Add form DW_FORM_data16 for holding MD5 values (in particular).
This version has also benefited greatly from review by Cary.
Here is the revised proposal:
=====================================================================
(1) In 6.2.4, add two new fields following the version field and
renumber subsequent bullets (and renumber subsequent fields):
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
3. address_size (ubyte)
A 1-byte unsigned integer containing the size in bytes of an
address (or offset portion of an address for segmented addressing)
on the target system.
*The address_size field is new in DWARF Version 5. It is needed
to legitimize the common practice of stripping all but the line
number sections (.debug_line and .debug_line_str) from an executable.*
4. segment_size (ubyte)
A 1-byte unsigned integer containing the size in bytes of a segment
selector on the target system.
*The segment_size field is new in DWARF Version 5. It is needed
in combination with the address_size field (preceding) to accurately
characterize the address representation on the target system.*
---------------------------------------------------------------------
=====================================================================
(2) In 6.2.4, replace all text starting at bullet number 11
through the end of the section with the following (note that the
previous bullet 11 becomes new bullet 13 because of the two fields
added above):
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
*The remaining fields provide information about the
source files used in the compilation. These fields
have been revised in DWARF Version 5 to support these
goals:
. To allow new alternative means for a consumer to
check that a file it can access is the same version
as that used in the compilation.
. To allow a producer to collect file name strings in
a new section (.debug_line_str) that can be used
to merge duplicate file name strings.
. To add the ability for producers to provide vendor-
defined information that can be skipped by a consumer
that is unprepared to process it.*
13. directory_entry_format_count (ubyte)
A count of the number of entries that occur in the
following directory_entry_format field.
14. directory_entry_format (sequence of uleb pairs)
A sequence of directory entry format descriptions.
Each description consists of a pair of uleb values:
. A content type code (see below)
. A form code using the attribute form codes
15. directories_count (uleb)
A count of the number of entries that occur in the
following directories field.
16. directories (sequence of directory names)
A sequence of directory names and optional related
information. Each entry is encoded as described
by the directory_entry_format field.
Entries in this sequence describe each path that was
searched for included source files in this compilation,
including the compilation directory of the compilation.
(The paths include those directories specified by the
user for the compiler to search and those the compiler
searches without explicit direction.)
The first entry is the current directory of the compilation.
Each additional path entry is either a full path name or
is relative to the current directory of the compilation.
The line number program assigns a number (index) to each
of the directory entries in order, beginning with 0.
*Prior to DWARF Version 5, the current directory was not
represented in the directories field and a directory index
of 0 implicitly referred to that directory as found in the
DW_AT_comp_dir attribute of the compilation unit DIE. In
DWARF Version 5, the current directory is explicitly present
in the directories field. This is needed to legitimize the
common practice of stripping all but the line number sections
(.debug_line and .debug_line_str) from an executable.*
*Note that if a .debug_line_str section is present, both
the compilation unit DIE and the line number header can
share a single copy of the current directory name string.*
17. file_name_entry_format_count (ubyte)
A count of the number of file entry format entries that
occur in the following file_name_entry_format field. If this
field is zero, then the file_names_count field (see below)
must also be zero.
18. file_name_entry_format (sequence of uleb pairs)
A sequence of file entry format descriptions.
Each description consists of a pair of uleb values:
. A content type code (see below)
. A form code using the attribute form codes
19. file_names_count (uleb)
A count of the number of file name entries that occur
in the following file_names field.
20. file_names (sequence of file name entries)
A sequence of file names and optional related
information. Each entry is encoded as described
by the file_name_entry_format field (in the
order described).
Entries in this sequence describe source files that
contribute to the line number information for this
compilation or is used in other contexts, such as in
a declaration coordinate or a macro file inclusion.
The primary source file is described by an entry whose
path name exactly matches that given in the DW_AT_name
attribute in the compilation unit.
The line number program assigns numbers to each of
the file name entries in order, beginning with 1, and uses
those numbers instead of file names in the line number
program that follows.
*A producer may generate an empty file_names field using
a file_names_count value of 0 and subsequently define file
names using the extended opcodes DW_LNE_define_file or
DW_LNE_define_file_MD5 (although these may not be intermixed
in the name line number section). In this case, the
file_name_entry_format_count should also be zero.*
6.2.4.1 Standard Content Descriptions
DWARF-defined content type codes are used to indicate
the type of information that is represented in one
component of an include directory or file name description.
The following type codes are defined.
1. DW_LNCT_path
The component is a null-terminated path name string.
If the associated form code is DW_FORM_string, then the
string occurs immediately in the containing directories
or file_names field. If the form code is DW_FORM_line_strp,
then the string is included in the .debug_line_str section
and its offset occurs immediately in the containing
directories or file_names field.
*Note that this use of DW_FORM_line_strp is similar to
DW_FORM_strp but refers to the .debug_line_str section,
not .debug_str.*
In a .debug_line.dwo section, the form DW_FORM_strx may
also be used. This refers into the .debug_str_offsets.dwo
section (and indirectly also the .debug_str.dwo section)
because no .debug_line_str_offsets.dwo or .debug_line_str.dwo
sections exist or are defined for use in split objects. (The
form DW_FORM_string may also be used, but this precludes the
benefits of string sharing.)
In the 32-bit DWARF format, the representation of a
DW_FORM_line_strp value is a 4-byte unsigned offset; in the
64-bit DWARF format, it is an 8-byte unsigned offset (see
Section 7.4 on page 163).
2. DW_LNCT_directory_index
The unsigned directory index represents an entry in the
directories field of the header. The index is 0 if
the file was found in the current directory of the compilation
(hence, the first directory in the directories field),
1 if it was found in the second directory in the directories
field, and so on.
This content code is always paired with one of DW_FORM_data1,
DW_FORM_data2 or DW_FORM_udata.
*The optimal form for a producer to use (which results in the
minimum size for the set of include_index fields) depends not only
on the number of directories in the directories
field, but potentially on the order in which those directories are
listed and the number of times each is used in the file_names field.
However, DW_FORM_udata is expected to be near optimal in most common
cases.*
3. DW_LNCT_timestamp
DW_LNCT_timestamp indicates that the value is the implementation-
defined time of last modification of the file, or 0 if not
available. It is always paired with one of the forms
DW_FORM_udata, DW_FORM_data4, DW_FORM_data8 or DW_FORM_block.
4. DW_LNCT_size
DW_LNCT_size indicates that the value is the unsigned size of the
file in bytes, or 0 if not available. It is paired with one of the
forms DW_FORM_udata, DW_FORM_data1, DW_FORM_data2, DW_FORM_data4,
or DW_FORM_data8.
5. DW_LNCT_MD5
DW_LNCT_MD5 indicates that the value is a 16-byte MD5 digest
of the file contents. It is paired with form DW_FORM_data16.
*Using this representation, the information found in a DWARF
Version 4 line number header could be encoded as follows:*
Field Field Name Value(s)
Number
1 *Same as in Version 4* ...
2 version 5
3 *Not present in Version 4* -
4 *Not present in Version 4* -
5-12 *Same as in Version 4* ...
13 directory_entry_format_count 1
14 directory_entry_format DW_LNCT_path, DW_FORM_string
15 directories_count <n+1>
16 directories <n+1>*<null terminated string>
17 file_name_entry_format_count 4
18 file_name_entry_format DW_LNCT_path, DW_FORM_string,
DW_LNCT_directory_index, DW_FORM_udata,
DW_LNCT_timestamp, DW_FORM_udata,
DW_LNCT_size, DW_FORM_udata
19 file_names_count <m>
20 file_names <m>*{<null terminated string>,
<index>, <timestamp>, <size>}
6.2.4.2 Vendor-defined Content Descriptions
Vendor-defined content descriptions may be defined using content
type codes in the range DW_LNCT_lo_user to DW_LNCT_hi_user. Each
such code may be combined with one or more forms from the set:
DW_FORM_block, DW_FORM_block1, DW_FORM_block2, DW_FORM_block4,
DW_FORM_data1, DW_FORM_data2, DW_FORM_data4, DW_FORM_data8,
DW_FORM_flag, DW_FORM_line_strp, DW_FORM_sdata, DW_FORM_sec_offset,
DW_FORM_string, DW_FORM_strp, DW_FORM_strx, and DW_FORM_udata.
*If a consumer encounters a vendor-defined content type that
it does not understand, it should skip the content data as though
it was not present.*
=====================================================================
(3) In Section 6.3.1, bullet 4., second paragraph: add DW_FORM_data16
to the list of forms.
=====================================================================
(4) In Section 7.4, bullet 3., add the following row in the table:
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
DW_FORM_line_strp .debug_line_strp
---------------------------------------------------------------------
=====================================================================
(5) In Section 7.4, insert a new bullet number 4. and renumber the
current 4. and higher appropriately.
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
4. Within the body of the .debug_line section, certain forms of content
description depend on the choice of DWARF format as follows. For the
32-bit DWARF format, the value is a 32-bit unsigned integer; for the
64-bit DWARF format, the value is a 64-bit unsigned integer.
Form Role
-----------------------------------------------
DW_FORM_line_strp offset in .debug_line_str
-----------------------------------------------
=====================================================================
(6) In Section 7.5.4, bullet constant, replace the first two sentences
with:
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
There are seven forms of constants. There are fixed length constant
data forms for one, two, four, eight and sixteen byte values
(respectively, DW_FORM_data1, DW_FORM_data2, DW_FORM_data4,
DW_FORM_data8 and DW_FORM_data16).
---------------------------------------------------------------------
In the following paragraph, add DW_FORM_data16 to th list of fixed length
forms.
=====================================================================
(7) In Section 7.5.4, in the bullet for string, in the second unnumbered
bullet, replace the first sentence (which begins "as an offset") with:
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
- as an offset into a string table contained in either the .debug_str
section (DW_FORM_strp) or the .debug_line_str section (DW_FORM_line_strp)
of the object file.
---------------------------------------------------------------------
=====================================================================
(8) Replace Table 7.25 and its introductory sentence with
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
The format encodings used for the directory and file name entries
in a line number table header are given in Table 7.25
Table 7.25: Line number content type encodings
File entry content type Value
---------------------------------
DW_LNCT_path + 0x1
DW_LNCT_include_index + 0x2
DW_LNCT_timestamp + 0x3
DW_LNCT_size + 0x4
DW_LNCT_MD5 + 0x5
DW_LNCT_lo_user + 0x2000
DW_LNCT_hi_user + 0x3fff
---------------------------------
+ New in DWARF Version 5
=====================================================================
(9) In Appendix G, add this line in the table:
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
.debug_line_str - - - *
---------------------------------------------------------------------
=====================================================================
(10) In Appendix B, add the following in Figure B.1 emanating from
((.debug_line)):
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
-> [[ DW_FORM_line_strp (r) ]] -> (( .debug_line_str ))
---------------------------------------------------------------------
where -> indicates an arrow, [[]] indicates a square box and (())
indicates an oval box,
and in the Notes add:
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
(r) Path contents in a line number table header can be represented
by an offset to the beginning of a string in the .debug_line_str
section using DW_FORM_line_strp.
DW_FORM_line_str may also be used for attributes in the .debug_info
section, notably DW_AT_comp_dir (not shown).
---------------------------------------------------------------------
Drawing an arrow from the .debug_line oval to the .debug_line_str oval
is not practical. Should I try to invent some kind of connector bubble,
or just let the the above note suffice?
=====================================================================
(11) Add .debug_line_str to the lists in Sections 1.4, 6.2, 7.3.2 and 7.29.
=====================================================================
(12) In Table 7.6, add
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
DW_FORM_data16 + 0x1c
---------------------------------------------------------------------
=====================================================================
(13) In Appendix Section F.1, at the end of the bullet for
.debug_line.dwo, append the following:
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
In a .dwo file there is no benefit to having a separate string
section for directories and file names because the primary
string table will never be stripped. Accordingly, no
.debug_line_str.dwo is defined. Content descriptions corresponding
to DW_FORM_line_str in an executable file (for example, in the
skeleton compilation unit) instead use DW_FORM_strx. This allows
directory and file name strings to be merged with general
strings and across compilations in package files (which are not
subject to potential stripping).
--
Revised 9/9/14. Previous version: http://dwarfstd.org/issues/140724.1-1.html
Accepted with modifications 9/26/14:
File zero is primary source file.
Define_file and define_file_md5 deprecated, codes reserved.