lmi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lmi] Code review request


From: Vadim Zeitlin
Subject: Re: [lmi] Code review request
Date: Tue, 22 Sep 2020 16:09:32 +0200

On Tue, 22 Sep 2020 11:26:35 +0000 Greg Chicares <gchicares@sbcglobal.net> 
wrote:

GC> Vadim--We have this pattern, for data used in reports:
GC>  - data vary by context (e.g., by input fields such as 'ProductName')
GC>  - store all context-specific values in some data structure somewhere
GC>  - look them up in context, to generate reports
GC> Often we store all potential values in a std::map--e.g.,
GC>  - title_map_t static_titles() in 'ledger_evaluator.cpp'
GC> and look them up by enumerators--e.g.,
GC>  - finra_assumption_detail's anonymous local enum in 'pdf_command_wx.cpp'
GC> just to cite a couple of examples in code you've worked on. Elsewhere:
GC>  - 'dbnames.hpp' specifies an enum for the lookup key
GC>  - 'dbdict.cpp' specifies product-specific values for each key, e.g.,
GC>    in sample::sample(), and they're stored in product-specific XML files
GC>  - 'ihs_basicval.cpp' reads those files and selects values in context, e.g.:
GC>      database().query_into(DB_MaturityAge   , EndtAge);
GC> Now we have a similar need, and before making any final decision about
GC> a design to meet it, I'd like to ask whether this "map-lookup-by-enum"
GC> style seems appropriate...or is there some better alternative?

 I'm sorry, but I'm afraid I don't understand the question as it's not
clear to me what exactly is part of the requirements and what is part of
the current implementation. Literal reading of the above would seem to
imply that we need to be able to look up values by an enum, in which case
the question becomes rather tautological: there can be no better
alternative to looking up by enum if this is what we _have_ to do. But I
strongly suspect I'm just missing the point completely.

 To help me understand it, could you please explain what exactly do we need
to be able to do? Also, there are probably parts of the existing
implementation that are completely impractical to change (e.g. XML might
not be the perfect format, but it could be out of question to consider
changing it because of the numerous existing data files), but I'm not
exactly sure what they are, i.e. what could be conceivably changed and what
can't be touched at all?

GC> This "similar need" is exemplified in branch odd/string_db, which I've
GC> pushed to savannah. It's a single-commit throwaway branch for discussion
GC> only: an actual working implementation for a single lookup, described
GC> in 'ledger_invariant_init.cpp'.
GC> 
GC> We already have '.database' files that hold arithmetic scalars that can
GC> vary by up to seven parameters ("axes") for each product, and '.policy'
GC> files that hold strings--which today vary by product only, but need to
GC> vary by the same parameters '.database' files allow.

 This is probably a very naive question, but why can't we just put the
.policy files contents into .database files? Is it just because changing
the code dealing with the .database files to accept strings, i.e. use
something like "std::variant<bool, int, double, std::string>" instead of
just "double" in database_entity, so complicated that we can't
realistically afford to make this change?

GC> In several cases we've used expedients like this:
GC> 
GC>     auto const smoke_or_tobacco = 
b->database().query<oenum_smoking_or_tobacco>(DB_SmokeOrTobacco);
GC>     if(oe_tobacco_nontobacco == smoke_or_tobacco)
GC>         {
GC>         switch(b->yare_input_.Smoking)
GC>             {
GC>             case mce_smoker:    Smoker =    "Tobacco"; break;
GC>             case mce_nonsmoker: Smoker = "Nontobacco"; break;
GC>             case mce_unismoke:  Smoker = "Unitobacco"; break;
GC>             }
GC>         }
GC>     else if(oe_smoker_nonsmoker == smoke_or_tobacco)
GC>     ...
GC> 
GC> where a string must vary by a '.database' axis, but '.database' files
GC> cannot hold strings.

 Right now they can't, but couldn't this be changed? Also, in this
particular case it doesn't seem too bad to do the translation in the code
like this because the strings are elements of a small fixed set, i.e. the
actual value is an enum, not a string.

GC> Perhaps worse, in 'ledger_invariant_init.cpp', a
GC> 'PolicyForm' string is specified in each '.policy' file, but sometimes
GC> it must vary by the "state" axis, so we have this ad-hockery:
GC> 
GC>         bool alt_form = b->database().query<bool>(DB_UsePolicyFormAlt);
GC>         PolicyForm = p.datum(alt_form ? "PolicyFormAlternative" : 
"PolicyForm");
GC> 
GC> which allows one default 'PolicyForm' and exactly one alternative.

 Now that I understand how this works, it seems kind of ingenious,
actually.

GC> I propose to rationalize those examples by introducing another level
GC> of indirection, across the whole board:
GC> 
GC>  - move most strings out of '.policy' into a new file for product-specific 
lingo

 So some strings would still remain in .policy files. What are they and how
are they different from the strings that will be moved? I.e. why do we need
both .policy and .lingo files, instead of just the latter?

GC>  - look up those strings by new '.database' keys, which can vary by 
multiple axes

 This makes sense, but I still don't see why is the extra level of
indirection needed in the first place, i.e. why can't we just put the
values in .lingo files directly into the .database files instead of using
the keys for them there.

GC> Thus, in the odd/string_db branch's microscopic example:
GC> 
GC>   // This can reside in a new file similar to 'dbnames.hpp':
GC>     enum e_xyzzy
GC>         {e_policy_form
GC>         ,e_policy_form_KS_KY
GC>         };

 I'm confused by this example. Isn't the whole point to have only a single
e_policy_form which varies by e_axis_state, i.e. can have different values
for the different states?

GC>   // This can be serialized to and from a new '.lingo' file that holds all 
such
GC>   // proprietary lingo for all products:
GC>     static std::map<e_xyzzy,std::string> xyzzy
GC>         {{e_policy_form, "UL32768-NY"}
GC>         ,{e_policy_form_KS_KY, "UL32768-X"}
GC>         };
GC>   // A cover function might unify this two-stage lookup:
GC>     auto policy_form = b->database().query<e_xyzzy>(DB_L_PolicyForm);
GC>     PolicyForm = xyzzy[policy_form];

 I.e. I am not sure how is this qualitatively different from the example
using DB_UsePolicyFormAlt above. I must be blind, but I just don't see what
has changed here other than using an enum instead of the bool?

 In particular, what would we do if we need to use UL32768-I for the states
starting with the letter "I"? I thought the goal was to be able to add the
entries for Idaho, Illinois, Indiana and Iowa without changing lmi sources,
but with this approach it seems that we'd need to update them in order to
add e_policy_form_ID_IL_IN_IA. Is this really how it's supposed to work?
 


GC> Do you see some better way of doing this? Here are some considerations:
GC>  - The '.lingo' file would contain about 300 entries initially, and
GC>    would probably never grow past about 1000. Entries might range from
GC>    five to perhaps five thousand characters, a typical size being 100.
GC>  - It has to be an external file, because it contains some proprietary
GC>    lingo that must be sequestered so that lmi remains free.
GC>  - lmi can read it from file OAOO, at startup; at run time, there will
GC>    be no insertions or deletions.
GC>  - Thus, it should probably be a std::unordered_map.

 There are several comments I could make here, such as that we might use
this as an opportunity to start using something other than XML for the
external files (as I've already mentioned a couple of times in the past,
personally I find https://sqlite.org/aff_short.html very convincing and
have been perfectly happy with using SQLite in other applications) and that
we could consider finding a perfect hash for the entries if they never
change to optimize the lookup if we decide it's worth it.

 But I'm so lost in the big picture that I think I shouldn't say anything
at all before I understand it better. Because right now the "obvious" way
to solve the stated goal of being able to look up strings by database_index
seems to be to just put these strings in the .database files directly. Yet
this doesn't seem to be considered at all and I don't know why: is it
because we can't change the format of the .database files? Or because we
can't (realistically) change database_entity and the code using it? BTW, if
only one XOR the other is true, we could still change the other one, i.e.
the structure of the external files doesn't need to map to the internal
data structure bijectively.

 Sorry for the lack of any helpful feedback, maybe I could say something
more useful once I understand better what are we trying to do and, maybe
even more importantly, what are we not trying to do.

 And thanks in advance for any explanations,
VZ

Attachment: pgp9qkYPRJvGZ.pgp
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]