emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Keeping replace-buffer-contents runtime in bounds


From: Tassilo Horn
Subject: Keeping replace-buffer-contents runtime in bounds
Date: Sat, 16 Feb 2019 21:09:40 +0100
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/27.0.50 (gnu/linux)

Hi Eli & all,

my context was that I'm using restclient.el for interactively querying
REST services receiving and returning JSON data.  My problem was that
json-pretty-print doesn't restore point when pretty-printing a JSON
snippet which is annoying at least in interactive use.

Eli suggested to use replace-region-contents instead of plain delete and
insert.  That's already in master including a function
replace-region-contents, and json-pretty-print uses that.

However, while I'm very happy with using that to format JSON snippets,
it can be extremely slow when the JSON objects become too large and
there are many differences between the original and the pretty-printed
version.  At work, it happens that I'm querying a REST service and that
returns minimized JSON data (i.e., all on one line, no whitespace) in
megabyte size which restclient.el will try to pretty-print.  It happened
to me that in such cases my emacs hung for a minute or two.

So I tried looking what I could to do make replace-buffer-contents
faster (in my branch scratch/replace-region-contents).

1. There's some too_expensive field of the context struct passed to
   compareseq which has quite some effect on the speed.  I couldn't find
   any negative sides in using a much lower value than we originally
   used (1,000,000).  However, maybe it's just my use-case.  This is now
   exposed as an optional argument of replace-buffer-contents.

2. The too_expensive field is not enough.  If the buffer contents and
   the contents of the replacement buffer become too large and there are
   too many differences, it may still take ages.

So now I added some code which allows compareseq to abort early if the
difference computation is too costly.  Initially I've tried to use the
number of differences found so far plus a max value.  However, after
using that some days I noticed that this is not a too good measure.
Sometimes there were gazillion of differences, yet the difference
computation was quick.  But other times the number of differences was
lower but still it took ages (most probably because the json was
larger).

In the end I settled for a maximum number of seconds one can define by
setting a new variable replace-buffer-contents-max-secs, so that you can
define what's still acceptable in the respective use-case.  (Actually,
if you set that to 1.5 or so, it may still run for 2 or more seconds
because the EARLY_ABORT expression isn't tested at regular intervals or
rather it is, but the intervals don't take equally long.)

If that number of seconds is over, compareseq returns early and
replace-buffer-contents falls back to plain delete and insert.

This is my first C encounter in emacs, so please feel free to nit-pick.

--8<---------------cut here---------------start------------->8---
diff --git a/lisp/emacs-lisp/subr-x.el b/lisp/emacs-lisp/subr-x.el
index 7d9f0bba4c..5020772dde 100644
--- a/lisp/emacs-lisp/subr-x.el
+++ b/lisp/emacs-lisp/subr-x.el
@@ -250,6 +250,33 @@ string-remove-suffix
       (substring string 0 (- (length string) (length suffix)))
     string))
 
+(defun replace-region-contents (beg end replace-fn &optional max-costs)
+  "Replace the region between BEG and END using REPLACE-FN.
+REPLACE-FN runs on the current buffer narrowed to the region.  It
+should return either a string or a buffer replacing the region.
+
+The replacement is performed using `replace-buffer-contents'
+which also describes the MAX-COSTS argument.
+
+Note: If the replacement is a string, it'll be placed in a
+temporary buffer so that `replace-buffer-contents' can operate on
+it.  Therefore, if you already have the replacement in a buffer,
+it makes no sense to convert it to a string using
+`buffer-substring' or similar."
+  (save-excursion
+    (save-restriction
+      (narrow-to-region beg end)
+      (goto-char (point-min))
+      (let ((repl (funcall replace-fn)))
+       (if (bufferp repl)
+           (replace-buffer-contents repl max-costs)
+         (let ((source-buffer (current-buffer)))
+           (with-temp-buffer
+             (insert repl)
+             (let ((tmp-buffer (current-buffer)))
+               (set-buffer source-buffer)
+               (replace-buffer-contents tmp-buffer max-costs)))))))))
+
 (provide 'subr-x)
 
 ;;; subr-x.el ends here
diff --git a/lisp/json.el b/lisp/json.el
index 3271c373b4..455444cc32 100644
--- a/lisp/json.el
+++ b/lisp/json.el
@@ -49,10 +49,13 @@
 ;; 2008-02-21 - Installed in GNU Emacs.
 ;; 2011-10-17 - Patch `json-alist-p' and `json-plist-p' to avoid recursion -tzz
 ;; 2012-10-25 - Added pretty-printed reformatting -Ryan Crum (address@hidden)
+;; 2019-02-02 - Pretty-printing now uses replace-region-contents and support 
for
+;;              minimization -tsdh
 
 ;;; Code:
 
 (require 'map)
+(require 'subr-x)
 
 ;; Parameters
 
@@ -749,7 +752,10 @@ json-pretty-print
         (json-object-type 'alist))
     (replace-region-contents
      begin end
-     (lambda () (json-encode (json-read))))))
+     (lambda () (json-encode (json-read)))
+     ;; FIXME: What's a good value here?  Can we use something better,
+     ;; e.g., by deriving a value from the size of the region?
+     64)))
 
 (defun json-pretty-print-buffer-ordered (&optional minimize)
   "Pretty-print current buffer with object keys ordered.
diff --git a/lisp/subr.el b/lisp/subr.el
index 44a1c60894..122a0d8da4 100644
--- a/lisp/subr.el
+++ b/lisp/subr.el
@@ -5476,30 +5476,4 @@ flatten-tree
 ;; for discoverability:
 (defalias 'flatten-list 'flatten-tree)
 
-(defun replace-region-contents (beg end replace-fn)
-  "Replace the region between BEG and END using REPLACE-FN.
-REPLACE-FN runs on the current buffer narrowed to the region.  It
-should return either a string or a buffer replacing the region.
-
-The replacement is performed using `replace-buffer-contents'.
-
-Note: If the replacement is a string, it'll be placed in a
-temporary buffer so that `replace-buffer-contents' can operate on
-it.  Therefore, if you already have the replacement in a buffer,
-it makes no sense to convert it to a string using
-`buffer-substring' or similar."
-  (save-excursion
-    (save-restriction
-      (narrow-to-region beg end)
-      (goto-char (point-min))
-      (let ((repl (funcall replace-fn)))
-       (if (bufferp repl)
-           (replace-buffer-contents repl)
-         (let ((source-buffer (current-buffer)))
-           (with-temp-buffer
-             (insert repl)
-             (let ((tmp-buffer (current-buffer)))
-               (set-buffer source-buffer)
-               (replace-buffer-contents tmp-buffer)))))))))
-
 ;;; subr.el ends here
diff --git a/src/editfns.c b/src/editfns.c
index 7a600bacf1..057641a982 100644
--- a/src/editfns.c
+++ b/src/editfns.c
@@ -20,6 +20,7 @@ along with GNU Emacs.  If not, see 
<https://www.gnu.org/licenses/>.  */
 
 #include <config.h>
 #include <sys/types.h>
+#include <sys/time.h>
 #include <stdio.h>
 
 #ifdef HAVE_PWD_H
@@ -1912,10 +1913,6 @@ determines whether case is significant or ignored.  */)
 #undef EQUAL
 #define USE_HEURISTIC
 
-#ifdef USE_HEURISTIC
-#define DIFFSEQ_HEURISTIC
-#endif
-
 /* Counter used to rarely_quit in replace-buffer-contents.  */
 static unsigned short rbc_quitcounter;
 
@@ -1937,30 +1934,44 @@ static unsigned short rbc_quitcounter;
   /* Bit vectors recording for each character whether it was deleted
      or inserted.  */                          \
   unsigned char *deletions;                    \
-  unsigned char *insertions;
+  unsigned char *insertions;                   \
+  struct timeval start;                        \
+  double max_secs;                             \
+  unsigned int early_abort_tests;
 
 #define NOTE_DELETE(ctx, xoff) set_bit ((ctx)->deletions, (xoff))
 #define NOTE_INSERT(ctx, yoff) set_bit ((ctx)->insertions, (yoff))
+#define EARLY_ABORT(ctx) compareseq_early_abort (ctx)
 
 struct context;
 static void set_bit (unsigned char *, OFFSET);
 static bool bit_is_set (const unsigned char *, OFFSET);
 static bool buffer_chars_equal (struct context *, OFFSET, OFFSET);
+static bool compareseq_early_abort (struct context *);
 
 #include "minmax.h"
 #include "diffseq.h"
 
 DEFUN ("replace-buffer-contents", Freplace_buffer_contents,
-       Sreplace_buffer_contents, 1, 1, "bSource buffer: ",
+       Sreplace_buffer_contents, 1, 2, "bSource buffer: ",
        doc: /* Replace accessible portion of current buffer with that of 
SOURCE.
 SOURCE can be a buffer or a string that names a buffer.
 Interactively, prompt for SOURCE.
+
+The optional argument MAX-COSTS defines the maximum costs of the
+difference computation.  If the costs are too high, heuristics are
+used to provide a faster but suboptimal solution.
+
 As far as possible the replacement is non-destructive, i.e. existing
 buffer contents, markers, properties, and overlays in the current
 buffer stay intact.
-Warning: this function can be slow if there's a large number of small
-differences between the two buffers.  */)
-  (Lisp_Object source)
+
+Because this function can be very slow if there's a large number of
+differences between the two buffers, it falls back to a plain delete
+and insert if comparing the buffers takes longer than
+`replace-buffer-contents-max-secs'.  In this case, it returns t.
+Otherwise it returns nil.  */)
+  (Lisp_Object source, Lisp_Object max_costs)
 {
   struct buffer *a = current_buffer;
   Lisp_Object source_buffer = Fget_buffer (source);
@@ -2007,6 +2018,19 @@ differences between the two buffers.  */)
   ptrdiff_t *buffer;
   USE_SAFE_ALLOCA;
   SAFE_NALLOCA (buffer, 2, diags);
+
+  if (NILP (max_costs))
+    XSETFASTINT (max_costs, 1000000);
+  else
+    CHECK_FIXNUM (max_costs);
+
+  double max_secs = -1.0;
+  if (FLOATP (Vreplace_buffer_contents_max_secs))
+    max_secs = XFLOAT_DATA (Vreplace_buffer_contents_max_secs);
+
+  struct timeval tv_cmpseq_start, tv_cmpseq_end, tv_cmpseq_duration;
+  gettimeofday(&tv_cmpseq_start, NULL);
+
   /* Micro-optimization: Casting to size_t generates much better
      code.  */
   ptrdiff_t del_bytes = (size_t) size_a / CHAR_BIT + 1;
@@ -2022,20 +2046,34 @@ differences between the two buffers.  */)
     .insertions = SAFE_ALLOCA (ins_bytes),
     .fdiag = buffer + size_b + 1,
     .bdiag = buffer + diags + size_b + 1,
-#ifdef DIFFSEQ_HEURISTIC
     .heuristic = true,
-#endif
     /* FIXME: Find a good number for .too_expensive.  */
-    .too_expensive = 64,
+    .too_expensive = XFIXNUM (max_costs),
+    .max_secs = max_secs,
+    .early_abort_tests = 0
   };
   memclear (ctx.deletions, del_bytes);
   memclear (ctx.insertions, ins_bytes);
+
   /* compareseq requires indices to be zero-based.  We add BEGV back
      later.  */
+  gettimeofday (&ctx.start, NULL);
   bool early_abort = compareseq (0, size_a, 0, size_b, false, &ctx);
-  /* Since we didn’t define EARLY_ABORT, we should never abort
-     early.  */
-  eassert (! early_abort);
+
+  gettimeofday (&tv_cmpseq_end, NULL);
+  timersub (&tv_cmpseq_end, &tv_cmpseq_start, &tv_cmpseq_duration);
+  message ("compareseq returned %s after %d.%d secs and %d early_abort_tests.",
+          early_abort ? "early" : "normally",
+          tv_cmpseq_duration.tv_sec, tv_cmpseq_duration.tv_usec,
+          ctx.early_abort_tests);
+
+  if (early_abort)
+    {
+      del_range (min_a, ZV);
+      Finsert_buffer_substring (source, Qnil,Qnil);
+      SAFE_FREE_UNBIND_TO (count, Qnil);
+      return Qt;
+    }
 
   rbc_quitcounter = 0;
 
@@ -2097,6 +2135,7 @@ differences between the two buffers.  */)
       --i;
       --j;
     }
+
   SAFE_FREE_UNBIND_TO (count, Qnil);
   rbc_quitcounter = 0;
 
@@ -2173,6 +2212,19 @@ buffer_chars_equal (struct context *ctx,
     == BUF_FETCH_MULTIBYTE_CHAR (ctx->buffer_b, bpos_b);
 }
 
+static bool
+compareseq_early_abort (struct context *ctx)
+{
+  ctx->early_abort_tests++;
+  if (ctx->max_secs < 0.0)
+    return false;
+
+  struct timeval now, diff;
+  gettimeofday (&now, NULL);
+  timersub (&now, &ctx->start, &diff);
+  return diff.tv_sec + diff.tv_usec / 1000000.0 > ctx->max_secs;
+}
+

 static void
 subst_char_in_region_unwind (Lisp_Object arg)
@@ -4441,6 +4493,12 @@ it to be non-nil.  */);
   binary_as_unsigned = true;
 #endif
 
+  DEFVAR_LISP ("replace-buffer-contents-max-secs",
+              Vreplace_buffer_contents_max_secs,
+              doc: /* If differencing the two buffers takes longer than this,
+`replace-buffer-contents' falls back to a plain delete and insert.  */);
+  Vreplace_buffer_contents_max_secs = Qnil;
+
   defsubr (&Spropertize);
   defsubr (&Schar_equal);
   defsubr (&Sgoto_char);
--8<---------------cut here---------------end--------------->8---

Bye,
Tassilo



reply via email to

[Prev in Thread] Current Thread [Next in Thread]