bison-patches
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] Named symbol references


From: Akim Demaille
Subject: Re: [PATCH] Named symbol references
Date: Sun, 28 Jun 2009 13:25:45 +0200


Le 27 juin 09 à 16:54, Alex Rozenman a écrit :

Hello,

According to Akim's permission, I pushed my changes to master. I'm also
going to "cherry-pick" it to bison-2.5. Hope I did everything correct.
Please tell me if you see any problem.


Hi Alex,

This is a great day for Bison, I'm very happy we can finally play with
your patch.  Below there are a few comments.  We also need so
attractive NEWS entry for these changes.  Your change introduces new
tabulations, and I'm trying to get rid of these, as they pollute the
indentation when discussing patches for instance.  If you can, please,
remove them.  Removing them globally in Bison right now is not a good
idea, we might do that later.


> diff --git a/src/local.mk b/src/local.mk
> index bdf9d2e..9ee1824 100644
> --- a/src/local.mk
> +++ b/src/local.mk
> @@ -104,6 +104,8 @@ src_bison_SOURCES =                            \
>    src/tables.c                                    \
>    src/tables.h                                    \
>    src/uniqstr.c                                   \
> -  src/uniqstr.h
> +  src/uniqstr.h                                   \
> +  src/named-ref.c                         \
> +  src/named-ref.h

Please, keep them sorted alphabetically.


> diff --git a/src/named-ref.h b/src/named-ref.h
> new file mode 100644
> index 0000000..9100296
> --- /dev/null
> +++ b/src/named-ref.h
> +struct named_ref
> +{
> +  uniqstr id;
> +  location loc;
> +};
> +
> +named_ref *named_ref_new (uniqstr id, location loc);
> +
> +void named_ref_free (named_ref *r);

Could you please add some documentation for these
structures/functions?


> diff --git a/src/parse-gram.y b/src/parse-gram.y
> index 8718a6d..0fbb8cb 100644
> --- a/src/parse-gram.y
> +++ b/src/parse-gram.y
> @@ -30,6 +30,7 @@
>  #include "quotearg.h"
>  #include "reader.h"
>  #include "symlist.h"
> +#include "named-ref.h"
>  #include "scan-gram.h"
>  #include "scan-code.h"

When the order does not matter, alphabetic order is perferred (but I
agree it is not used everywhere).


> @@ -167,6 +170,7 @@ static int current_prec = 0;
>  %token TAG             "<tag>"
>  %token TAG_ANY         "<*>"
>  %token TAG_NONE        "<>"
> +%token BRACKETED_ID    "[id]"

[identifier]


> @@ -529,6 +536,14 @@ rhs:
>      { grammar_current_rule_merge_set ($3, @3); }
>  ;
>
> +named_ref.opt:
> +  /* Nothing.  */
> +    { $$ = 0; }
> +|
> +  BRACKETED_ID
> +    { $$ = named_ref_new($1, @1); }
> +;

These fit on two single lines.



> @@ -328,6 +348,9 @@ grammar_midrule_action (void)
>    symbol *dummy = dummy_symbol_get (dummy_location);
> symbol_list *midrule = symbol_list_sym_new (dummy, dummy_location);
>
> +  /* Remember named_ref of previous action */

Please end comments with a period.


> @@ -402,24 +425,40 @@ grammar_current_rule_merge_set (uniqstr name, location loc)
>     action as a mid-rule action.  */
>
>  void
> -grammar_current_rule_symbol_append (symbol *sym, location loc)
> +grammar_current_rule_symbol_append (symbol *sym, location loc,
> +                              named_ref *named_ref)
>  {
> +  symbol_list *p;
>    if (current_rule->action_props.code)
>      grammar_midrule_action ();
> -  grammar_symbol_append (sym, loc);
> +  p = grammar_symbol_append (sym, loc);
> +
> +  if (named_ref)
> +    {
> +       if (named_ref->id == sym->tag)
> +  {
> +    warn_at (named_ref->loc,
> +             _("duplicated symbol name for %s ignored"),
> +             quote (sym->tag));
> +    named_ref_free (named_ref);

I feel I already read this before, please factor.  Maybe a function to
add a named reference to a symbol.

> +  }
> +       else
> +   p->named_ref = named_ref;
> +    }
>  }


> diff --git a/src/scan-code.l b/src/scan-code.l
> index 88f8990..4c91fd1 100644
> --- a/src/scan-code.l
> +++ b/src/scan-code.l
> @@ -48,6 +48,10 @@ YY_DECL;
>  static void handle_action_dollar (symbol_list *rule, char *cp,
>                              location dollar_loc);
> static void handle_action_at (symbol_list *rule, char *cp, location at_loc);
> +
> +/* A string to be pushed to obstack after dollar/at has been handled */

Period.


> @@ -75,6 +80,12 @@ tag      [^\0\n>]+
>     white space between the backslash and the newline.  */
>  splice     (\\[ \f\t\v]*\n)*
>
> +/* C style identifier. Must start with letter. Will be used for
> +   named symbol references. */
> +letter      [-.abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ_]
> +id          {letter}({letter}|[0-9])*
> +ref      -?[0-9]+|{id}|"["{id}"]"|"$"

Too bad we can't factor this.  Please, leave a note so that both
scan-code.l and scan-gram.l point to each other on this regard.

> +static inline bool
> +symbol_list_null(symbol_list *l)
> +{
> + if (l && !(l->content_type == SYMLIST_SYMBOL && l->content.sym == NULL))
> +    return false;
> +  else
> +    return true;
> +}

Should be in symlist.[hc].  And please, reverse the test and return
directly the Boolean expression.  Use (p) or (!p) rather than making
NULL explicit.


> +static inline bool
> +is_dot_or_dash(char ch)
> +{
> +  return ch == '.' || ch == '-';
> +}
> +
> +static inline bool
> +is_digit(char ch)
> +{
> +  return '0' <= ch && ch <= '9';

Not sure this is right according to the asumptions in Bison: I don't
think we require a charset that features this property.  Have a look
elsewhere, looking for isdigit for instance.  See also Paul Eggert's
code in is_identifier.

> +static inline bool
> +contains_dot_or_dash(const char* str)
> +{
> +  return strpbrk(str, ".-") != NULL;
> +}

I don't think we use strpbrk so far.  So have a look at gnulib, we
might need replacement.


> +
> +#define VARIANT_HIDDEN (1 << 0)
> +#define VARIANT_BAD_BRACKETING (1 << 1)
> +#define VARIANT_NOT_VISIBLE_FROM_MIDRULE (1 << 2)

Please, add a bit of documentation for these guys.

> +
> +typedef struct
> +{
> +  /* Index in symbol list. */
> +  long int ind;

Needs to be signed?  Long is quite large, unsigned probably suffices,
but please check for overflows.  There is no exisiting typedef for this?
"ind" is a bit cryptic to me eyes, "index" is not too long.

> +  /* Hidding named reference. */

hiding?


> +static variant *
> +variant_table_grow()
> +{
> +  ++variant_count;
> +  if (variant_count > variant_table_size)
> +    {
> +      while (variant_count > variant_table_size)
> +  variant_table_size = 2 * variant_table_size + 3;
> +      variant_table = xnrealloc (variant_table, variant_table_size,
> +                           sizeof *variant_table);
> +    }
> +  return &variant_table[variant_count - 1];
> +}

There is no support for growing tables in gnulib?


> +static char *
> +find_prefix_end(const char *prefix, char *begin, char *end)
> +{
> +  char *ptr = begin;
> +
> +  while (*prefix && ptr != end)
> +    {
> +      if (*prefix != *ptr)
> +  return 0;
> +      ++prefix, ++ptr;
> +    }

This really looks like a for-loop.

> +#define INVALID_REF (INT_MIN)
> +#define LHS_REF (INT_MIN + 1)
> +
> +static long int
> +parse_named_ref(char *cp, symbol_list *rule, int rule_length,
> +          int midrule_rhs_index, char *text, location loc,
> +          char dollar_or_at)

More doc please.  What is "long int"?  Probably the domain for ind, so
please introduce some "variant_index" typedef.

> +{
> +  symbol_list *l;
> +  char *cp_end;
> +  bool exact_mode;
> +  bool has_error;
> +  bool has_valid;
> +  long int ind, i;
> +  variant* variant;
> +  char* p;

Please, reduce the size of the scopes of these variables, possibly
introducing { } to introduce new, smaller, scopes.  Possibly
intermediate smaller functions.

> +  if ('$' == *cp)
> +    return LHS_REF;
> +
> +  if (is_digit (*cp) || (*cp == '-' && is_digit (* (cp + 1))))
> +    {
> +      long int num = strtol (cp, &cp, 10);
> +      if (1 - INT_MAX + rule_length <= num && num <= rule_length)
> +  return num;
> +      else
> +  {
> +    complain_at (loc, _("integer out of range: %s"), quote (text));
> +    return INVALID_REF;
> +  }

Can't we factor our strtol invocations?


> +  if ('[' == *cp)
> +    {
> +      exact_mode = true;
> +
> +      /* Ignore the brackets. */
> +      ++cp;
> +      for (p = cp; *p != ']'; ++p);

Use "continue" as the body.

> +      for (p = cp; *p; ++p);

Ditto.


> +      /* Check visibility from mid-rule actions. */
> +      if (midrule_rhs_index != 0 &&

Put && at the beginning of the continuing line.

> +    (ind == 0 || ind > midrule_rhs_index))

Prefer "<" to ">".


> +  {
> +    variant->err |= VARIANT_NOT_VISIBLE_FROM_MIDRULE;
> +    has_error = true;
> +  }
> +
> +      /* Check correct bracketing. */
> +      if (!exact_mode && contains_dot_or_dash(variant->id))
> +  {
> +    variant->err |= VARIANT_BAD_BRACKETING;
> +    has_error = true;
> +  }
> +
> +      /* Check using of hidden symbols. */
> +      if (variant->hidden_by != NULL)
> +  {
> +    variant->err |= VARIANT_HIDDEN;
> +    has_error = true;
> +  }
> +
> +      if (!variant->err)
> +  has_valid = true;
else
        has_error = true;

should factor the three previous cases.  In which case, please swap
the test to make it positive:

    if (variant->err)
      has_error = false;
    else
      has_valid = true;

> +  if (variant_count == 1 && has_valid)
> +    {
> +      /* The only "good" case is here. */
> +      ind = variant_table[0].ind;
> +      if (ind == midrule_rhs_index)
> +  return LHS_REF;
> +      else
> +  return ind;

This is an example of where "ind" should be short lived.


> +  if (variant_count == 0)
> + complain_at (loc, _("reference is invalid: %s, symbol not found"),
> +           quote (text));
> +  else if (variant_count > 1 && !has_error)
> +    complain_at (loc, _("reference is ambiguous: %s"),
> +           quote (text));
> +  else if (variant_count > 1 && has_valid && has_error)
> +    complain_at (loc, _("reference is misleading: %s"),
> +           quote (text));
> +  else
> +    complain_at (loc, _("reference is invalid: %s"),
> +           quote (text));

I believe we avoid sentences with verbs when we can, so

  ambiguous reference: ...
  misleading reference: ...
  invalid reference: ...



> +  for (i = 0; i < variant_count; ++i)
> +    {
> +      static char at_buf[20];

Eeerk.

> +
> +      variant = &variant_table[i];
> +
> +      if (variant->ind == 0)
> +  strcpy(at_buf, "$$");
> +      else
> +  snprintf(at_buf, sizeof(at_buf), "$%d", variant->ind);

No braces for these sizeofs.


> +    static struct obstack msg_buf;
> +    const char *tail = "";
> +    const char *id;
> +    location loc;
> +
> +    if (!exact_mode)
> +      tail = cp + strlen(variant->id);
> +
> +    if (variant->hidden_by)
> +      {
> +        id = variant->hidden_by->id;
> +        loc = variant->hidden_by->loc;
> +      }
> +    else
> +      {
> +        id = variant->id;
> +        loc = variant->loc;
> +      }

Prefer using ? : in the declaration, even if means twice the
condition:

const char* id = variant->hidden_by ? variant->hidden_by->id : variant->id

or even

   ...* reference = variant->hidden_by ? variant->hidden_by : variant
  const char* id = reference->id;
  location loc = reference->loc;



> +    if (contains_dot_or_dash (id))
> +      obstack_fgrow1 (&msg_buf, "[%s]", id);
> +    else
> +      obstack_sgrow (&msg_buf, id);
> +    obstack_sgrow (&msg_buf, tail);
> +
> +    if (variant->err & VARIANT_HIDDEN)
> +      {
> +        obstack_fgrow1 (&msg_buf, ", hiding %c", dollar_or_at);
> +        if (contains_dot_or_dash (variant->id))
> +          obstack_fgrow1 (&msg_buf, "[%s]", variant->id);
> +        else
> +          obstack_sgrow (&msg_buf, variant->id);
> +        obstack_sgrow (&msg_buf, tail);

This pattern should be factored.


> @@ -286,8 +615,9 @@ handle_action_dollar (symbol_list *rule, char *text, location dollar_loc)
>  {
>    char const *type_name = NULL;
>    char *cp = text + 1;
> +  char *gt_ptr = 0;
>    symbol_list *effective_rule;
> -  int effective_rule_length;
> +  int effective_rule_length, n;

Officially the GNU Coding standard forbid declaring several variables
in a single line.  Not a big problem with me, but...


> @@ -459,6 +803,7 @@ code_props_plain_init (code_props *self, char const *code, location code_loc)
>    self->location = code_loc;
>    self->is_value_used = false;
>    self->rule = NULL;
> +  self->named_ref = NULL;
>  }
>
>  void
> @@ -470,17 +815,20 @@ code_props_symbol_action_init (code_props *self, char const *code,
>    self->location = code_loc;
>    self->is_value_used = false;
>    self->rule = NULL;
> +  self->named_ref = NULL;
>  }

Can't we factor these two?


>  void
>  code_props_rule_action_init (code_props *self, char const *code,
> -                             location code_loc, symbol_list *rule)
> +                             location code_loc, symbol_list *rule,
> +                       named_ref *named_ref)
>  {
>    self->kind = CODE_PROPS_RULE_ACTION;
>    self->code = code;
>    self->location = code_loc;
>    self->is_value_used = false;
>    self->rule = rule;
> +  self->named_ref = named_ref;

And this one too...



> diff --git a/src/scan-gram.l b/src/scan-gram.l
> index 7c5b600..f71409e 100644
> --- a/src/scan-gram.l
> +++ b/src/scan-gram.l
> @@ -61,10 +61,21 @@ static size_t no_cr_read (FILE *, char *, size_t);
>      return PERCENT_FLAG;                        \
>    } while (0)
>
> +#define ROLLBACK_CURRENT_TOKEN                                  \
> +  do {                                                          \
> +    scanner_cursor.column -= mbsnwidth (yytext, yyleng, 0);       \
> +    yyless (0);                                                 \
> +  } while (0)

We need extensive documentation here explaining why/when this is
needed.  I really mean a single large comment (rather than many small
ones) explaining what makes scanning for named reference non trivial,
and all the bits that, put together, implement it.


> +/* Bracketed identifier */

Period.

> @@ -314,25 +335,94 @@ splice        (\\[ \f\t\v]*\n)*
>
>  <SC_AFTER_IDENTIFIER>
>  {
> +  "[" {
> +    if (!bracketed_id_str)

Prefer positif if.


> +  /*--------------------------------.
> +  | Scanning bracketed identifiers. |
> +  `--------------------------------*/
> +
> +<SC_BRACKETED_ID>
> +{
> +  {id} {
> +    if (!bracketed_id_str)

Likewise.

> +      {
> +  bracketed_id_str = uniqstr_new (yytext);
> +  bracketed_id_loc = *loc;
> +      }
> +    else
> +      {
> +  complain_at (*loc, _("redundant identifier in bracketed name: %s"),
> +               quote (yytext));
> +      }
> +  }



> diff --git a/src/symlist.c b/src/symlist.c
> index 6c6b57d..974d974 100644
> --- a/src/symlist.c
> +++ b/src/symlist.c
> @@ -35,7 +35,7 @@ symbol_list_sym_new (symbol *sym, location loc)
>
>    res->content_type = SYMLIST_SYMBOL;
>    res->content.sym = sym;
> -  res->location = loc;
> +  res->location = res->sym_loc = loc;
>
>    res->midrule = NULL;
>    res->midrule_parent_rule = NULL;
> @@ -47,6 +47,8 @@ symbol_list_sym_new (symbol *sym, location loc)
>    res->dprec = 0;
>    res->merger = 0;
>
> +  res->named_ref = NULL;
> +
>    res->next = NULL;
>
>    return res;
> @@ -64,7 +66,8 @@ symbol_list_type_new (uniqstr type_name, location loc)
>
>    res->content_type = SYMLIST_TYPE;
>    res->content.type_name = type_name;
> -  res->location = loc;
> +  res->location = res->sym_loc = loc;
> +  res->named_ref = NULL;
>    res->next = NULL;
>
>    return res;
> @@ -81,7 +84,8 @@ symbol_list_default_tagged_new (location loc)
>    symbol_list *res = xmalloc (sizeof *res);
>
>    res->content_type = SYMLIST_DEFAULT_TAGGED;
> -  res->location = loc;
> +  res->location = res->sym_loc = loc;
> +  res->named_ref = NULL;

Obviously there are opportunities for factoring in Bison...  Note to
self :)


>    return res;
> @@ -98,7 +102,8 @@ symbol_list_default_tagless_new (location loc)
>    symbol_list *res = xmalloc (sizeof *res);
>
>    res->content_type = SYMLIST_DEFAULT_TAGLESS;
> -  res->location = loc;
> +  res->location = res->sym_loc = loc;
> +  res->named_ref = NULL;
>    res->next = NULL;
>
>    return res;





> diff --git a/tests/named-refs.at b/tests/named-refs.at
> new file mode 100644
> index 0000000..d8fd2ec
> --- /dev/null
> +++ b/tests/named-refs.at
> @@ -0,0 +1,457 @@
> +# Named references test.                           -*- Autotest -*-
> +
> +# Copyright (C) 2009 Free Software
> +# Foundation, Inc.

Single line.


> +AT_BANNER([[Named references tests.]])
> +
> +AT_SETUP([Tutorial calculator])
> +
> +AT_DATA_GRAMMAR([test.y],
> +[[
> +%{
> +#include <stdio.h>
> +#include <stdlib.h>

There is a lot of code duplication with calc.at here.  We should find
a means to factor all this.




> +#######################################################################
> +
> +AT_SETUP([Many kinds of errors])
> +AT_DATA_GRAMMAR([test.y],
> +[[
> +%token IDENT
> +%token NUMBER
> +%token ASSIGNOP
> +%token IF
> +%token IF1
> +%token THEN
> +%token ELSE
> +%token FI
> +%token WHILE
> +%token DO
> +%token OD
> +%start program
> +%%

Using "if" "then" and so forth, you don't need to declare these tokens.


> +if_stmt1: IF expr[cond] THEN stmt[then] ELSE stmt.list[else] FI
> +          { $if_stmt1 = new IfStmt($cond1, $then.f1, $else); };


> +#######################################################################
> +
> +AT_SETUP([Missing identifiers in brackets])
> +AT_DATA_GRAMMAR([test.y],
> +[[
> +%%
> +start: foo[] bar
> +  { s = $foo; }
> +]])
> +AT_BISON_CHECK([-o test.c test.y], 1, [],
> +[[test.y:11.12: a non empty identifier expected

It seems clear to me that identifiers cannot be empty, so I would
rather report simply "expected identifier" (as a Bison parser would
have, had the "natural" parsing been possible here).


> +start: foo[ a d ] bar
> +  { s = $foo; }
> +]])
> +AT_BISON_CHECK([-o test.c test.y], 1, [],
> +[[test.y:11.15: redundant identifier in bracketed name: `d'

unexpected identifier.




This is great work, thanks a lot!





reply via email to

[Prev in Thread] Current Thread [Next in Thread]