tinycc-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Tinycc-devel] Unify C and asm symbols


From: Michael Matz
Subject: Re: [Tinycc-devel] Unify C and asm symbols
Date: Sun, 10 Dec 2017 06:50:13 +0100 (CET)
User-agent: Alpine 2.20 (LSU 67 2015-01-07)

Hi,

On Wed, 6 Dec 2017, grischka wrote:

My tests go through also without this hunk, so it's either an optimization (but then it should be tested before the strcmp) or it's a left-over from during development of your patch, or it's for a situation I don't know yet (perhaps one of the multi-file ones?)

f1.c:
     asm("call x1; ret; x1: ...");
f2.c:
     void x1() {}

MAY cause
     "error: 'x1' defined twice"

Yeah, that's whay I meant with multi-file mode problems ...

3) instead of (2), call rebuild_hash() at the end of each file.

Where all of 1..3 still have this problem:
f1.c:
   x1() { printf("x1(1)\n"); }
   main() { x1(); main_2(); }
f2.c:
   main_2() { asm("call x1"); }
   static x1() { printf("x1(2)\n"); }

... and this.

4) create asm symbols as "static" first always. This however needs to
   post-process asm symbol again (to convert any still undefined symbols
   to globals),  as well as to patch relocation entries if such global
   already was defined in a previous file.

And yes, using static asm symbols from the beginning is the only good solution (I see) to the above multi-file problems, with the associated fallout problems from that, as you noticed: newly globalized symbols need to be in the ELF hash table, the "old" local symbols might already be used in emitted relocs and undefined-at-end must be global.

See attached (additional) patch.  It definitely would add some lines
but then again I wasn't able to produce any problems w.r.t. multi-files
with that.

Comments ? ;)

I wasn't a big fan of the reloc rewriting, and when thinking about the above multi-file problems I had a more linker centric approach in mind.

So what I came up with is below: doing a proper regular-symbol resolution step: basically similar to what's done while reading in .o files. Advantages: no rewriting of reloc entries (which is expensive in your patch as you iterate all relocs for each into-global change), only needing to rebuild the hash table once or twice per output file (or -run), the undef globalization can be folded into it and it's overall a bit shorter (30 lines added) :)

This is on top of your last two patches (i.e. without the static-asm-sym patch). If you don't beat me to it I think I'll push your two and this one somewhen tomorrow; together they (and even just your two) are a clear progression (and all three together even source line number neutral! ;) )

What do you think?


Ciao,
Michael.

commit c47d42823da2bb2908ce6a60452a24ba58b1b273
Author: Michael Matz <address@hidden>
Date:   Sun Dec 10 06:18:27 2017 +0100

    Fix some multi-file corner cases with asm

    for this we have to create also asm symbols as VT_STATIC initially
    except if there's an indication that it should be global (.globl
    or undefined at end of unit).  For this to work we need to
    be able to globalize symbols after they were local and enter them
    into the ELF hash tables, and also adjust the symbols that were
    potentially already used in relocs when they were still local.

    The easiest is to do a proper symbol resolution step also in multi-file
    mode, for regular symbols (the non-dynamic ones, i.e. not from shared
    libs).
---
 tcc.h    |  4 +--
 tccasm.c | 21 +++++-----------
 tccelf.c | 87 ++++++++++++++++++++++++++++++++++++++++++++++++----------------
 tccpe.c  |  3 +--
 tccrun.c |  3 +--
 5 files changed, 74 insertions(+), 44 deletions(-)

diff --git a/tcc.h b/tcc.h
index 38bbda8..926ac5d 100644
--- a/tcc.h
+++ b/tcc.h
@@ -911,7 +911,6 @@ struct filespec {

 /* symbol was created by tccasm.c first */
 #define VT_ASM (VT_VOID | VT_UNSIGNED)
-#define VT_ASM_GLOBAL VT_DEFSIGN
 #define IS_ASM_SYM(sym) (((sym)->type.t & (VT_BTYPE | VT_ASM)) == VT_ASM)

 /* token values */
@@ -1426,11 +1425,10 @@ ST_FUNC void put_stabs_r(const char *str, int type, int 
other, int desc, unsigne
 ST_FUNC void put_stabn(int type, int other, int desc, int value);
 ST_FUNC void put_stabd(int type, int other, int desc);

-ST_FUNC void relocate_common_syms(void);
+ST_FUNC void resolve_regular_syms(void);
 ST_FUNC void relocate_syms(TCCState *s1, Section *symtab, int do_resolve);
 ST_FUNC void relocate_section(TCCState *s1, Section *s);

-ST_FUNC void tcc_add_linker_symbols(TCCState *s1);
 ST_FUNC int tcc_object_type(int fd, ElfW(Ehdr) *h);
 ST_FUNC int tcc_load_object_file(TCCState *s1, int fd, unsigned long 
file_offset);
 ST_FUNC int tcc_load_archive(TCCState *s1, int fd);
diff --git a/tccasm.c b/tccasm.c
index 4fe3f32..355d158 100644
--- a/tccasm.c
+++ b/tccasm.c
@@ -42,13 +42,12 @@ static Sym *asm_label_find(int v)
     return sym;
 }

-static Sym *asm_label_push(int v, int t)
+static Sym *asm_label_push(int v)
 {
-    Sym *sym = global_identifier_push(v, t, 0);
     /* We always add VT_EXTERN, for sym definition that's tentative
        (for .set, removed for real defs), for mere references it's correct
        as is.  */
-    sym->type.t |= VT_ASM | VT_EXTERN;
+    Sym *sym = global_identifier_push(v, VT_ASM | VT_EXTERN | VT_STATIC, 0);
     sym->r = VT_CONST | VT_SYM;
     return sym;
 }
@@ -67,7 +66,7 @@ ST_FUNC Sym* get_asm_sym(int name, Sym *csym)
 {
     Sym *sym = asm_label_find(name);
     if (!sym) {
-       sym = asm_label_push(name, 0);
+       sym = asm_label_push(name);
        if (csym)
          sym->c = csym->c;
     }
@@ -102,7 +101,7 @@ static void asm_expr_unary(TCCState *s1, ExprValue *pe)
                 /* forward */
                 if (!sym || (sym->c && elfsym(sym)->st_shndx != SHN_UNDEF)) {
                     /* if the last label is defined, then define a new one */
-                   sym = asm_label_push(label, VT_STATIC);
+                   sym = asm_label_push(label);
                 }
             }
            pe->v = 0;
@@ -381,7 +380,7 @@ static Sym* asm_new_label1(TCCState *s1, int label, int 
is_local,
         }
     } else {
     new_label:
-        sym = asm_label_push(label, is_local == 1 ? VT_STATIC : 0);
+        sym = asm_label_push(label);
     }
     if (!sym->c)
       put_extern_sym2(sym, NULL, 0, 0, 0);
@@ -392,11 +391,6 @@ static Sym* asm_new_label1(TCCState *s1, int label, int 
is_local,
     if (is_local != 2)
         sym->type.t &= ~VT_EXTERN;

-    if (IS_ASM_SYM(sym) && !(sym->type.t & VT_ASM_GLOBAL)) {
-        sym->type.t |= VT_STATIC;
-        update_storage(sym);
-    }
-
     return sym;
 }

@@ -679,11 +673,8 @@ static void asm_parse_directive(TCCState *s1, int global)
             Sym *sym;
             next();
             sym = get_asm_sym(tok, NULL);
-           if (tok1 != TOK_ASMDIR_hidden) {
+           if (tok1 != TOK_ASMDIR_hidden)
                 sym->type.t &= ~VT_STATIC;
-                if (IS_ASM_SYM(sym))
-                    sym->type.t |= VT_ASM_GLOBAL;
-            }
             if (tok1 == TOK_ASMDIR_weak)
                 sym->a.weak = 1;
            else if (tok1 == TOK_ASMDIR_hidden)
diff --git a/tccelf.c b/tccelf.c
index 657aa61..6b2e227 100644
--- a/tccelf.c
+++ b/tccelf.c
@@ -299,6 +299,9 @@ static void rebuild_hash(Section *s, unsigned int 
nb_buckets)
     strtab = s->link->data;
     nb_syms = s->data_offset / sizeof(ElfW(Sym));

+    if (!nb_buckets)
+        nb_buckets = ((int*)s->hash->data)[0];
+
     s->hash->data_offset = 0;
     ptr = section_ptr_add(s->hash, (2 + nb_buckets + nb_syms) * sizeof(int));
     ptr[0] = nb_buckets;
@@ -370,9 +373,7 @@ ST_FUNC int put_elf_sym(Section *s, addr_t value, unsigned 
long size,
     return sym_index;
 }

-/* find global ELF symbol 'name' and return its index. Return 0 if not
-   found. */
-ST_FUNC int find_elf_sym(Section *s, const char *name)
+static int find_elf_sym_1(Section *s, const char *name, int onlydef)
 {
     ElfW(Sym) *sym;
     Section *hs;
@@ -388,13 +389,21 @@ ST_FUNC int find_elf_sym(Section *s, const char *name)
     while (sym_index != 0) {
         sym = &((ElfW(Sym) *)s->data)[sym_index];
         name1 = (char *) s->link->data + sym->st_name;
-        if (!strcmp(name, name1) && ELFW(ST_BIND)(sym->st_info) != STB_LOCAL)
+        if (!strcmp(name, name1) && ELFW(ST_BIND)(sym->st_info) != STB_LOCAL
+           && (!onlydef || sym->st_shndx != SHN_UNDEF))
             return sym_index;
         sym_index = ((int *)hs->data)[2 + nbuckets + sym_index];
     }
     return 0;
 }

+/* find global ELF symbol 'name' and return its index. Return 0 if not
+   found. */
+ST_FUNC int find_elf_sym(Section *s, const char *name)
+{
+    return find_elf_sym_1(s, name, 0);
+}
+
 /* return elf symbol value, signal error if 'err' is nonzero */
 ST_FUNC addr_t get_elf_sym_addr(TCCState *s, const char *name, int err)
 {
@@ -716,21 +725,6 @@ static void sort_syms(TCCState *s1, Section *s)
     tcc_free(old_to_new_syms);
 }

-/* relocate common symbols in the .bss section */
-ST_FUNC void relocate_common_syms(void)
-{
-    ElfW(Sym) *sym;
-
-    for_each_elem(symtab_section, 1, sym, ElfW(Sym)) {
-        if (sym->st_shndx == SHN_COMMON) {
-            /* symbol alignment is in st_value for SHN_COMMONs */
-           sym->st_value = section_add(bss_section, sym->st_size,
-                                       sym->st_value);
-            sym->st_shndx = bss_section->sh_num;
-        }
-    }
-}
-
 /* relocate symbol table, resolve undefined symbols if do_resolve is
    true and output error if undefined symbol. */
 ST_FUNC void relocate_syms(TCCState *s1, Section *symtab, int do_resolve)
@@ -1167,7 +1161,7 @@ ST_FUNC void tcc_add_runtime(TCCState *s1)
 /* add various standard linker symbols (must be done after the
    sections are filled (for example after allocating common
    symbols)) */
-ST_FUNC void tcc_add_linker_symbols(TCCState *s1)
+static void tcc_add_linker_symbols(TCCState *s1)
 {
     char buf[1024];
     int i;
@@ -1226,6 +1220,48 @@ ST_FUNC void tcc_add_linker_symbols(TCCState *s1)
     }
 }

+/* Do final regular symbol preparation (for those coming from .c/.o/.s files,
+   not from shared libs)  */
+ST_FUNC void resolve_regular_syms(void)
+{
+    int rebuild = 0;
+    ElfW(Sym) *sym;
+
+    /* Allocate common symbols in BSS.  */
+    for_each_elem(symtab_section, 1, sym, ElfW(Sym)) {
+        if (sym->st_shndx == SHN_COMMON) {
+            /* symbol alignment is in st_value for SHN_COMMONs */
+           sym->st_value = section_add(bss_section, sym->st_size,
+                                       sym->st_value);
+            sym->st_shndx = bss_section->sh_num;
+        }
+    }
+
+    /* Now assign linker provided symbols their value.  */
+    tcc_add_linker_symbols(tcc_state);
+
+    /* And finally resolve still UNDEF symbols (for multi-file mode),
+       and globalize those that are still UNDEF.  */
+    rebuild_hash(symtab_section, 0);
+    for_each_elem(symtab_section, 1, sym, ElfW(Sym)) {
+       if (sym->st_shndx == SHN_UNDEF) {
+            const char *name = (char *) symtab_section->link->data + 
sym->st_name;
+           int symndx = find_elf_sym_1(symtab_section, name, 1);
+            if (symndx) {
+                *sym = ((ElfSym *)symtab_section->data)[symndx];
+               rebuild = 1;
+           } else if (ELFW(ST_BIND)(sym->st_info) == STB_LOCAL) {
+               sym->st_info
+                   = ELFW(ST_INFO)(STB_GLOBAL, ELFW(ST_TYPE)(sym->st_info));
+               rebuild = 1;
+           }
+       }
+    }
+    if (rebuild)
+       rebuild_hash(symtab_section, 0);
+}
+
+
 static void tcc_output_binary(TCCState *s1, FILE *f,
                               const int *sec_order)
 {
@@ -2016,8 +2052,7 @@ static int elf_output_file(TCCState *s1, const char 
*filename)
     if (file_type != TCC_OUTPUT_OBJ) {
         /* if linking, also link in runtime libraries (libc, libgcc, etc.) */
         tcc_add_runtime(s1);
-        relocate_common_syms();
-        tcc_add_linker_symbols(s1);
+       resolve_regular_syms();

         if (!s1->static_link) {
             if (file_type == TCC_OUTPUT_EXE) {
@@ -2058,6 +2093,14 @@ static int elf_output_file(TCCState *s1, const char 
*filename)
             }
         }
         build_got_entries(s1);
+    } else {
+       for_each_elem(symtab_section, 1, sym, ElfW(Sym)) {
+           if (sym->st_shndx == SHN_UNDEF
+               && ELFW(ST_BIND)(sym->st_info) == STB_LOCAL) {
+               sym->st_info
+                   = ELFW(ST_INFO)(STB_GLOBAL, ELFW(ST_TYPE)(sym->st_info));
+           }
+       }
     }

     /* we add a section for symbols */
diff --git a/tccpe.c b/tccpe.c
index a67023d..af4438d 100644
--- a/tccpe.c
+++ b/tccpe.c
@@ -1970,8 +1970,7 @@ ST_FUNC int pe_output_file(TCCState *s1, const char 
*filename)

     tcc_add_bcheck(s1);
     pe_add_runtime(s1, &pe);
-    relocate_common_syms(); /* assign bss addresses */
-    tcc_add_linker_symbols(s1);
+    resolve_regular_syms();
     pe_set_options(s1, &pe);

     ret = pe_check_symbols(&pe);
diff --git a/tccrun.c b/tccrun.c
index d0501a0..72b3994 100644
--- a/tccrun.c
+++ b/tccrun.c
@@ -188,8 +188,7 @@ static int tcc_relocate_ex(TCCState *s1, void *ptr, addr_t 
ptr_diff)
         pe_output_file(s1, NULL);
 #else
         tcc_add_runtime(s1);
-        relocate_common_syms();
-        tcc_add_linker_symbols(s1);
+       resolve_regular_syms();
         build_got_entries(s1);
 #endif
         if (s1->nb_errors)



reply via email to

[Prev in Thread] Current Thread [Next in Thread]