bug-global
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[PATCH] Enhancements to global.cgi


From: Taisuke Yamada
Subject: [PATCH] Enhancements to global.cgi
Date: Tue, 27 Jul 2010 23:19:19 +0900
User-agent: Thunderbird 2.0.0.24 (Windows/20100228)

Hi.

While working on Debian package of global, I learned there is
some ongoing change in global.cgi.

Since I have some use-cases that might be affected, I looked
into the code and came up with extended implementation of
global.cgi. It would be nice if you consider merging.

Attached script has following changes:

=== What's added ===
- In addition to "sitekey" file, it also looks for subdir under
  configured search path. This will simplify setup as you just
  need to add parent folder once in search path. After that,
  no configuration is needed regardless of how many projects you index.

- Supports multiple search path for sitekey/subdir lookup.

- Runs under server-less mode, which several CLI web browsers
  supports (AFAIK, w3m/lynx/elinks supports this mode, though I
  only tested with w3m). This means you can browse/search
  HTML-ized source code without running a server.

- Better support for bookmark and remote links even under system-
  wide CGI mode. It used to break as these links never come with
  proper Referer: URL.

=== What's missing ===
- Generated HTML is still somewhat lacking.

Best Regards,
Taisuke Yamada
#!/usr/bin/perl
# -*- mode: perl; coding: utf-8-unix -*-

=head1 NAME

 global.cgi - Web frontend to source-code index created by gtags/htags.

=head1 SYNOPSIS

 # when running as system-wide shared CGI (uses id= to select index to show)
 http://host/cgi-bin/global.cgi?id=...;pattern=...

 # when running as per-project CGI
 http://host/per/project/dir/HTML/cgi-bin/global.cgi?pattern=...

 # when running under serverless mode (example with w3m)
 HTAGS_DIR=$(global -p)/HTML \
 w3m -o cgi_bin=/dir/where/global.cgi/resides $(global -p)/HTML/index.html

=head1 DESCRIPTION

This script provides Web interface for the GNU Global source code
tag system.

=head1 QUICK USAGE

For most cases, you'd probably want to start using in "per-project"
configuration.

  $ cd /path/to/project
  $ gtags -v
  $ htags -Df --suggest
  $ ls 
  ... HTML/ ...
  $ ls HTML/cgi-bin/
  global.cgi index.html

With this setup, all related files go under HTML/ folder. After
running htags(1), configure server so HTML/ folder can be accessed
over the Web.

  http://your.server/path/to/HTML/

will be the starting page.

=head1 SAMPLE WEB CONFIGURATION

Althrough htags(1) prepares appropriate HTML/.htaccess automatically,
some tweaks may be needed for extra features. Following is a sample
configuration that works with both compressed/uncompressed setup:

  Options +ExecCGI +MultiViews
  DirectoryIndex index
  AddHandler cgi-script .cgi
  AddType     text/html .ghtml
  AddEncoding x-gzip    .ghtml
  <Files *.html.gz>
   ForceType text/html
  </Files>

Make sure your web server permits you from defining these
in .htaccess (see: AllowOverride in httpd.conf).

=head1 NOTE
This script tries not to depend on 3rd party modules, and uses
only modules bundled with Perl itself.

=cut

use Cwd qw(abs_path);
use Symbol qw(gensym);
use IPC::Open3 qw(open3);
use English;
use IO::File;
use strict;
use warnings;

# external parameter(s) in configuration
our($GSPATHDATA);

eval {
    my $opts = parse_option($ENV{QUERY_STRING});
    my $conf = setup_config($opts);
    my $prog = spawn_global($conf) || die "Failed to spawn global";
    my @buff = (gets($prog), gets($prog));

    showpage($conf, "Pattern not found") unless $buff[0]; # no result
    redirect(makelink($conf, $buff[0]))  unless $buff[1]; # just one result

    print_header($conf, html_escape("search result for $opts->{pattern}"));
    print "<h1 class='title'>" . html_escape($opts->{pattern}) . "</h1>\n";
    print "<hr/>";

    while ($_ = shift(@buff) || gets($prog)) {
        my $link = makelink($conf, $_);
        next unless s/^\d+\s+([^\s]+)//o;
        printf(<<EOF, $link, html_escape($1), html_escape($_));
<span class='curline'><a href='%s'>%s</a>%s</span>
EOF
    }
    print_footer($conf);
    close($prog);
};
showpage({}, "[ERROR] Execution failed", $@) if $@; # error handler
exit(0);

######################################################################
# functions
######################################################################

## for debugging
sub dumpenv {
    print "Content-Type: text/plain\n\n";
    print "=== ENV ===\n";
    foreach (sort keys %ENV) {
        print "$_=$ENV{$_}\n";
    }
    exit(0);
}

## return CGI query as hash table
sub parse_option {
    my @args = split(/[;&]/, shift);
    my $opts = {};

    foreach (@args) {
        my($name, $value) = map { uri_decode($_, 1) } split(/=/, $_, 2);
        $opts->{$name} = $value; # NOTE: multi-args overwritten for now
    }
    $opts;
}

## Determines and creates run-time configuration.
## All important parameter is set in this function.
sub setup_config {
    my $opts = shift;
    my $conf = { opts => $opts };

    # load external configuration
    foreach ("/etc/gtags/htmake.conf",
             "/etc/global/web.conf", "$ENV{HOME}/.global/web.conf") {
        do $_ if -f $_;
    }

    # various fallback defaults
    $opts->{pattern} ||= "main";
    $conf->{GLOBAL}  ||= "/usr/bin/global";
    $conf->{DOCDIR}  ||= "HTML"; # default name for "HTML" folder

    # Find "HTML" folder path and "base URL" to it.
    #
    # Because there're several ways to run this script, it's not
    # always possible to find these reliably. Current code tries to
    # cover following 3 use-cases:
    #
    #   1. System-wide CGI mode (ex. "/cgi-bin/global.cgi?id=...&...")
    #   2. Per-project CGI mode (ex. "/dir/HTML/cgi-bin/global.cgi?...")
    #   3. Server-less CGI mode (local execution by several CLI-browsers)
    #
    # By overriding HTAGS_DIR/HTAGS_URL, you can directly control this
    # behavior. How and where to override is up to the user.
    $conf->{HTAGS_DIR} = $ENV{HTAGS_DIR} || find_htags_dir($conf);
    $conf->{HTAGS_URL} = $ENV{HTAGS_URL} || find_htags_url($conf);

    # GTAGSROOT defines "base dir" of project source tree
    $ENV{GTAGSROOT} = gets(IO::File->new("$conf->{HTAGS_DIR}/GTAGSROOT"))
        if -f "$conf->{HTAGS_DIR}/GTAGSROOT";

    # select default suffix by compression mode
    $conf->{SUFFIX} = -s "$conf->{HTAGS_DIR}/compress" ? 'ghtml' : 'html';

    $conf;
}

## Returns absolute path of HTML/ folder to use.
sub find_htags_dir {
    my $conf = shift;
    my $opts = $conf->{opts};

    # find dir with named key in search path
    if ($opts->{id}) {
        die "Invalid key: id. Should not use path-like string."
            if $opts->{id} =~ m!(/|\.\.)!;

        foreach (split(/:/, $GSPATHDATA)) {
            return "$_/$opts->{id}/$conf->{DOCDIR}"
                if -d "$_/$opts->{id}/$conf->{DOCDIR}";
            return gets(IO::File->new("$_/$opts->{id}"))
                if -f "$_/$opts->{id}";
        }
    }

    # assume to run as per-project "<src>/<HTML>/cgi-bin/global.cgi"
    return abs_path("..");
}

## Returns base URL (absolute or path-absolute) to use for
## redirection and linking generated contents.
sub find_htags_url {
    my $conf = shift;
    my $opts = $conf->{opts};

    # If explicitly specified, use it.
    return gets(IO::File->new("$conf->{HTAGS_DIR}/HTAGS_URL"))
        if -f "$conf->{HTAGS_DIR}/HTAGS_URL";

    # This should cover most "per-project CGI" usecase...
    return "$PREMATCH/$1/$2"
        if $ENV{REQUEST_URI} =~ m|/([^/]+)/([^/]+)/cgi-bin/global.cgi\b|;

    # Try determining from "Referer:".
    # Precedence was lowered from prior version as depending on
    # Referer: header breaks remote links and bookmarks.
    my $url;
    if ($url = $ENV{HTTP_REFERER} && $ENV{HTTP_HOST} &&
        $url =~ m!^https?://$ENV{HTTP_HOST}!) { # quick validity check
        $url =~ s!(/(S|defines))?/[^/]+$!!;     # strip subdir/file part
        return $url;
    }

    # All else failed. Assume it is running under server-less mode, and
    # fallback to local path. Obviously, this never works for HTTP(S)...
    $conf->{HTAGS_DIR};
}

## invoke global(1) in right folder with given options
sub spawn_global {
    my $conf = shift;
    my $opts = $conf->{opts};
    my @prog = ($conf->{GLOBAL}, '--result=ctags-xid');

    push(@prog, '-i') if $opts->{icase};
    push(@prog, '-o') if $opts->{other};
    push(@prog, '-r') if $opts->{type} eq 'reference';
    push(@prog, '-s') if $opts->{type} eq 'symbol';
    push(@prog, '-P') if $opts->{type} eq 'path';
    push(@prog, '-g') if $opts->{type} eq 'grep';
    push(@prog, '-I') if $opts->{type} eq 'idutils';
    push(@prog, '-e', $opts->{pattern}) if $opts->{pattern};

    # adjust environment so global(1) can generate correct relative path
    chdir($conf->{HTAGS_DIR})                || die "Missing HTAGS_DIR";
    chdir($ENV{GTAGSROOT} || abs_path("..")) || die "Missing GTAGSROOT";

    # FIXME: needs some cleanup
    my($wio, $rio, $eio, $err);
    open3($wio, $rio, $eio = gensym, @prog);
    die $err if $err = join("", $eio->getlines);
    $rio;
}

sub gets {
    chomp($_ = readline($_[0])); $_;
}

sub uri_decode {
    my($arg, $is_query) = @_;

    $arg =~ tr/+/ / if $is_query;
    $arg =~ s/%([\da-f][\da-f])/pack("C", hex($1))/egi;
    $arg;
}

sub uri_encode {
    my $arg = shift;

    $arg =~ s!([^-./\w])!printf("%%%.2x", ord($1))!eg;
    $arg;
}

sub html_escape {
    my $arg = shift;
    my %map = ('&' => '&amp;', '<' => '&lt;', '>' => '&gt;');

    $arg =~ s/([&<>])/$map{$1}/ge;
    $arg;
}

## convert global(1) output into URL
sub makelink {
    my($conf, $arg) = @_;
    my($fid, $tag, $lno, $filename) = split(/\s+/, $arg);

    "$conf->{HTAGS_URL}/S/$fid.$conf->{SUFFIX}#L$lno";
}

sub redirect {
    print "Location: $_[0]\n\n"; exit(0);
}

sub showpage {
    my($conf, $title, $message) = @_;

    $message ||= $title;

    print_header($conf, $title);
    print <<EOF;
=== global.cgi returned following result ===
$message
EOF
    print_footer($conf);
    exit(0);
}

sub print_header {
    my($conf, $title) = @_;
    print <<EOF;
Content-Type: text/html

<html>
<head>
<title>global - $title</title>
<meta name='robots' content='noindex,nofollow' />
<meta name='generator' content='GLOBAL' />
<link rel='stylesheet' type='text/css' href='$conf->{HTAGS_URL}/style.css' />
</head>
<body><pre>
EOF
}

sub print_footer {
    my($conf) = @_;
    print <<EOF;
</pre>
<hr />
<a href='$conf->{HTAGS_URL}/mains.$conf->{SUFFIX}'>[return]</a>
</body>
</html>
EOF
}

reply via email to

[Prev in Thread] Current Thread [Next in Thread]