[GSoC] Reafactoring Automake: Report

automake
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[GSoC] Reafactoring Automake: Report

From:	Matthias Paulmier
Subject:	[GSoC] Reafactoring Automake: Report
Date:	Fri, 01 Jun 2018 16:43:37 +0200
User-agent:	Gnus/5.13 (Gnus v5.13) Emacs/25.2 (gnu/linux)
           __________________________________________________

             GSOC: MODULARIZE AUTOMAKE AND IMPROVE ITS TEST
                                 SUITE

                           Matthias Paulmier
           __________________________________________________





Introduction
============

  In this project, my first task is to analyze how Automake is built and
  how it can be improved.  The obvious first point will be to look into
  the actual "main" script of the project
  (`automake/bin/automake[.in]').  It is this big 8000+ lines of code
  file which clearly needs to be trimmed down.  Their are multiple ways
  to approach this problem.  The one I chose, which looked the most
  obvious to me, is to try and move out as much subroutines as possible
  in separate modules.  The idea is that this file should only (or
  mostly) be a caller for its submodules' methods.

  This document attempts to describe how the current hierarchy will be
  modified during the next couple of weeks.  This is not a binding
  document as I will probably change my mind when moving some code
  around or as I realize I made bad assumptions as to how the program is
  structured.  This is a first approach for me to have an easier time
  understanding where to start.


Move out the variable/constant declarations
===========================================

  The first thing I did, as explained in the introduction, was to move
  the variable and constant declarations, which were at the top of the
  file, to another module.  I created a new module called
  `Automake::Global' which held these variables.  I didn't immediately
  have a clear idea of what I would do with it but it seemed important
  to have them declared in a separate file to be able to relocate them
  when the modularization process would start.


Coupling and cohesion
=====================

  Refactoring some code consists in reapplying good practices which have
  been lost during the multiple years of active development of the
  project.  Coupling and cohesion are two important concepts in terms of
  code robustness and maintainability.

  Coupling measures the degree of dependence between modules inside a
  program.  The goal is to have as low coupling as possible (meaning as
  low dependence's as possible) inside a program.  This ensures that the
  code will be easily maintainable and/or reusable.

  Cohesion is a more abstract concept but is also an important one.  It
  measures to which degree the methods inside a module belong together.
  We want as high cohesion as possible inside a module.

  These two concepts are very important and will be taken into
  consideration all along this summer when coming up with new modules or
  moving code to existing ones.


Find a tool to check cross references
=====================================

  The next step was to find the dependencies between subroutines and
  global variables (now declared in a separate file), or other
  subroutines, to be able to group them together or make a hierarchy
  between them.

  The first thing I had to do was to find a tool that could scan these
  dependencies and print a report for me to be able to get the
  information.  I didn't know at the time what such a tool would be
  called and had a hard time browsing CPAN for something I couldn't even
  name.  I then contacted people on the #perl IRC channel to have their
  input on that.  They told me I was looking for a cross reference tool
  and pointed me to one in particular, `B::Xref'.  This program takes a
  Perl file as an input and outputs a report showing the cross
  references between and inside modules.

  I used this tool to produce a table (which you will find attached as
  table.html) (it is a very large table so I'm putting it in a separate
  file).  This table is a useful tool to be able to quickly check cross
  references for a particular subroutine but is not very efficient to
  browse.  So I then produced a Directed Acyclic Graph (DAG) to be able
  to have a better visualization of the different dependencies.  The
  hard part was to make the graph as easy to read as possible.  I first
  wanted to visually isolate functions and variables by module as
  subgraphs (clusters in `graphviz' terms) but quickly realized that it
  created a big mess of arrows crossing each other.  It wasn't visible
  at all.  So I went with a simple color coding and let the `sfdp'
  command be in charge of arranging the DAG in the most efficient way
  possible.

  You will find the full graph attached as `full_graph.svg' (the svg
  format is convenient for navigating with text searches).

  Legend for the DAG:
  - functions from `automake/bin/automake' -> circled blue nodes
  - global variables -> circled red nodes
  - each package has its own color and each call to a method of the
    package is represented by an arrow of the same color

  It may look a bit scary right now but it is really useful and the
  color coding helps following the dependencies.  Also, with some clever
  "grepping" from the source file we can get the dependency graph for a
  certain subroutine or remove the global variables.  Attached as
  `no_vars.svg', you can find the same DAG without the variables for
  example.

  To get it from the source with the following command `grep -v
  '^.*var_.*' graph.dot | grep -v '^$' | sfdp -Tpng -Goverlap=scale -o
  outfile.png'

  As we can see in these graphs, there is a lot of coupling between the
  different modules.  We want to lower this as much as possible.  The
  main file has very low cohesion too which is less of a problem for us
  since it will be corrected as we advance and move subroutines out of
  it.


Information we can get from the table/DAGs
==========================================

  From the table and the DAGs we had some crucial information directly
  coming out.  First, some subroutines and variables didn't seem to be
  used at all.  Second, some subroutines use a lot of variables.  These
  variables are manipulated by other subroutines which don't seem to
  have a lot in common.  This will need to be investigated further later
  to see what we can do with that.  And then we have subroutines that
  don't use any external variables (only lexical ones declared in their
  code) or external subroutines, such as `verbose_var' for example.
  These are usually some utility functions which we may be able to group
  together in a module.


Unused subroutines in `automake/bin/automake'
=============================================

  For each subroutine that appeared unused in the DAG, I double checked
  in the different files to be sure they were not used elsewhere or
  omitted by `B::Xref' for some reason (it can omit some variables
  sometimes).  Turns out some of them were false positives as they were
  in fact used in the file.

  Here are the unused subroutines in `automake/bin/automake':

   Subroutine name        Utility
  ---------------------------------------------------------------
   `lang_vala_rewrite'    replaces the `.vala' extension by `.c'
   `lang_java_rewrite'    returns the constant `LANG_SUBDIR'
   `lang_header_rewrite'  returns the constant `LANG_IGNORE'

  We can see that two of them simply return the value of a constant
  which doesn't seem very practical.  These subroutines will be removed
  from the program.


Unused global variables
=======================

  I used the same process for unused variables and constants.  This time
  their were more false positives as `B::Xref' doesn't seem to be able
  to recognize constant usage and some code structures such as the
  ternary operation or multiple assignments with the same `=' operator.
  The variables and constants that are not used can also be moved out of
  the program to make some clear space.

  Here are the unused variables from `automake/lib/Automake/Global.pm':

   Variable name        Value
  -------------------------------------------------------------------
   `libtool_sometimes'  `qw(ltmain.sh config.guess config.sub'
   `TARGET_PATTERN'[1]  `'address@hidden(){}/address@hidden''
   `PATH_PATTERN'[1]    `'(\w)' | `[+/.-])+''


The problem of constants
========================

  Since the tool I used would not detect constant usage properly (they
  have sometimes been misinterpreted as subroutine calls or not taken
  into consideration at all), this work had to be done mostly by hand.
  Fortunately, their wasn't as many constants as their was variables so
  it wasn't so much of a big task.


Isolated subroutines
====================

  Looking at the DAG, we can see some subroutines (circled in blue)
  which are isolated from the rest.  We should look into these and see
  if something obvious comes to mind when it comes to moving them out.
  For all the subroutines mentioned below we will move the associated
  global variables from `Automake::Global' with them when possible.  The
  idea is to group nodes together inside clusters starting from the
  leaves of our DAG.  This way we can find coherent modules while
  keeping coupling low as we work our way through it via the edges.


New modules
===========

  We have now gathered some information to create new modules from the
  parts above.

  First, for all the general purpose utility functions, we have two
  choices.  We could move these functions to a new module called
  `Automake:Utils' or move them to the already existing
  `Automake::General' module which would apparently (when looking at its
  name) fill this purpose.  When digging into this module, we can see
  that it redefines some core Perl functionalities (such as the END
  function).  Knowing that, I think it would be better to create a new
  module and place the subroutines accordingly.  The `version', `usage'
  and `parse_arguments' subroutine should be moved to this module as
  well as the subroutines cited above.

  Among the list of subroutines in the main file, we can see a number of
  subroutines with the same naming scheme (for example `handle_*' or
  `require_*').  I need to see if they all serve a related purpose and
  then group them accordingly into new or existing modules.


Existing modules
================

  When it comes to existing modules, I will try to move some subroutines
  to them when it is possible.  The goal of the project is not to add as
  much modules as possible but rather make the structure of the program
  as coherent as possible while not changing the behavior for the users.
  The important thing is to understand which module does what.  Here is
  a table with a small explanation of the modules' utility:

  - `Automake::ChannelDefs': Defines channels and helper functions;
  - `Automake::Channel': Functions for error and warning management;
  - `Automake::Condition': Record conjunction of conditionals;
  - `Automake::Config': Global variables to be used in the program (like
    the release number or year of lease);
  - `Automake::Configure_ac': Functions for locating `configure.ac' or
    `configure.in';
  - `Automake::DisjConditions': Like `Automake::Condition' but with
    disjunction instead of conjunction;
  - `Automake::FileUtils': File handling;
  - `Automake::General': Redefine Perl core functionalities;
  - `Automake::Getopt': Get command line options;
  - `Automake::Global': Global variable and constant definitions;
  - `Automake::ItemDef': Base class for `Automake::VarDef' and
    `Automake::RuleDef';
  - `Automake::Item': Base class for `Automake::Var' and
    `Automake::Rule';
  - `Automake::Language': Class for defining languages;
  - `Automake::Location': Track location with a stack of contexts;
  - `Automake::Options': Keep track of `Automake' options;
  - `Automake::RuleDef': Rule definitions;
  - `Automake::Rule': Define `Automake' rules;
  - `Automake::VarDef': Variable definitions;
  - `Automake::Variable': Define `Automake' variables;
  - `Automake::Version': Functions for comparing program versions based
    on patterns;
  - `Automake::Wrap': Functions for formatting paragraphs;
  - `Automake::XFile': Error handling in file handles;


Conclusion
==========

  As we've seen in this report, their is a big task ahead of me in terms
  of module reorganization.  For the sake of quality code and easier
  maintainability, I will add unit tests for the newly created modules
  as well as the ones that are not tested.  To see if the new
  architecture is well integrated with the already existing one, I will
  use the existing test suite and the integration/validation tests it
  provides.  At the end of the summer, Automake should feel the same for
  users while being more easily maintainable and extendable.



Footnotes
_________

[1] The variable is used internally in the global file but is unused
elsewhere so we will remove it from its default exports.


--
Matthias Paulmier
table.html
Description: Table
full_graph.svg
Description: Full dependency DAG
no_vars.png
Description: Dependency DAG without variables
[Prev in Thread]
Current Thread
[Next in Thread]
[GSoC] Reafactoring Automake: Report, Matthias Paulmier <=
- Re: [GSoC] Reafactoring Automake: Report, Matthias Paulmier, 2018/06/04
  - Re: [GSoC] Reafactoring Automake: Report, Mathieu Lirzin, 2018/06/04
Next by Date: Re: [GSoC] Reafactoring Automake: Report
Next by thread: Re: [GSoC] Reafactoring Automake: Report
Index(es):
- Date
- Thread