gnuastro-commits
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[gnuastro-commits] master cf6ca3c: Library (kdtree.h): new k-d tree spac


From: Mohammad Akhlaghi
Subject: [gnuastro-commits] master cf6ca3c: Library (kdtree.h): new k-d tree space-partitioning tree library
Date: Sun, 30 Aug 2020 22:47:02 -0400 (EDT)

branch: master
commit cf6ca3cc98a4df0dd5c927ce02115d07d480aea9
Author: Sachin Kumar Singh <sachinkumarsingh092@gmail.com>
Commit: Mohammad Akhlaghi <mohammad@akhlaghi.org>

    Library (kdtree.h): new k-d tree space-partitioning tree library
    
    With this commit, a new k-d tree library is added in Gnuasto, that is
    written from scratch for optimal memory and CPU usage. It currently
    contains two higher level functions for tree creation and nearest neighbour
    searches.
    
    This is particularly useful for matching multidimensional points, which
    will be used in the future when the quad matching algorithm will be
    integrated for astrometry in the 'match' program. It can also be used for
    more optimal and multi-threaded matching of large catalogs.
---
 NEWS                  |   2 +
 doc/gnuastro.texi     | 146 ++++++++++++-
 lib/Makefile.am       |  14 +-
 lib/gnuastro/kdtree.h |  62 ++++++
 lib/kdtree.c          | 563 ++++++++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 778 insertions(+), 9 deletions(-)

diff --git a/NEWS b/NEWS
index 8e0ebc5..9a16024 100644
--- a/NEWS
+++ b/NEWS
@@ -87,6 +87,8 @@ See the end of the file for license conditions.
    - GAL_INTERPOLATE_NEIGHBORS_FUNC_MAX: Use maximum for interpolation.
    - GAL_INTERPOLATE_NEIGHBORS_FUNC_MEDIAN: Use median for interpolation.
    - GAL_INTERPOLATE_NEIGHBORS_FUNC_INVALID: for error-handling/completeness.
+   - gal_kdtree_create: Create a k-d tree for optimal spatial searching.
+   - gal_kdtree_nearest_neighbour: Find the nearest neighbor using a k-d tree.
    - gal_statistics_histogram2d: Generate 2D histogram from two columns.
    - GAL_WCS_FLTERROR: Limit to identify a floating point error for WCS.
    - gal_wcs_write: Write the given WCS into a FITS extension with no data.
diff --git a/doc/gnuastro.texi b/doc/gnuastro.texi
index de352fc..c683ccc 100644
--- a/doc/gnuastro.texi
+++ b/doc/gnuastro.texi
@@ -615,6 +615,7 @@ Gnuastro library
 * Bounding box::                Finding the bounding box.
 * Polygons::                    Working with the vertices of a polygon.
 * Qsort functions::             Helper functions for Qsort.
+* K-d tree::                    Space partitioning in K dimensions.
 * Permutations::                Re-order (or permute) the values in a dataset.
 * Matching::                    Matching catalogs based on position.
 * Statistical operations::      Functions for basic statistics.
@@ -18934,6 +18935,7 @@ If you use the Info version of this manual (see 
@ref{Info}), you don't have to w
 * Bounding box::                Finding the bounding box.
 * Polygons::                    Working with the vertices of a polygon.
 * Qsort functions::             Helper functions for Qsort.
+* K-d tree::                    Space partitioning in K dimensions.
 * Permutations::                Re-order (or permute) the values in a dataset.
 * Matching::                    Matching catalogs based on position.
 * Statistical operations::      Functions for basic statistics.
@@ -24102,7 +24104,7 @@ Finally, both these arrays are merged together to get 
the final sorted array of
 
 
 
-@node Qsort functions, Permutations, Polygons, Gnuastro library
+@node Qsort functions, K-d tree, Polygons, Gnuastro library
 @subsection Qsort functions (@file{qsort.h})
 
 @cindex @code{qsort}
@@ -24231,7 +24233,147 @@ increasing order (first element will have the 
smallest value).
 
 
 
-@node Permutations, Matching, Qsort functions, Gnuastro library
+
+@node K-d tree, Permutations, Qsort functions, Gnuastro library
+@subsection K-d tree (@file{kdtree.h})
+@cindex K-d tree
+K-d tree is a space-partitioning binary search tree for organizing points in a 
k-dimensional space.
+They are a very useful data structure for multidimensional searches like range 
searches and nearest neighbor searches.
+For a more formal and complete introduction see 
@url{https://en.wikipedia.org/wiki/K-d_tree, the Wikipedia page}.
+
+Each non-leaf node in a k-d tree divides the space into two parts, known as 
half-spaces.
+To select the top/root node for partitioning, we find the median of the points 
and make a hyperplane normal to the first dimension.
+The points to the left of this space are represented by the left subtree of 
that node and points to the right of the space are represented by the right 
subtree.
+This is then repeated for all the points in the input, thus associating a 
``left'' and ``right'' branch for each input point.
+
+Gnuastro uses the standard algorithms of the k-d tree with one small 
difference that makes it much more memory and CPU optimized.
+The set of input points that define the tree nodes are given as a list of 
Gnuastro's data container type, see @ref{List of gal_data_t}.
+Each @code{gal_data_t} in the list represents the point's coordinate in one 
dimension, and the first element in the list is the first dimension.
+Hence the number of data values in each @code{gal_data_t} (which must be equal 
in all of them) represents the number of points.
+This is the same format that Gnuastro's Table reading/writing functions 
read/write columns in tables, see @ref{Table input output}.
+
+The output k-d tree is a list of two @code{gal_data_t}s, representing the 
input's row-number (or index, counting from 0) of the left and right subtrees 
of each row.
+Each @code{gal_data_t} thus has the same number of rows (or points) as the 
input, but only containing integers with a type of @code{uint32_t} (unsigned 
32-bit integer).
+If a node has no left, or right subtree, then @code{GAL_BLANK_UINT32} will be 
used.
+Below you can see the simple tree for 2D points from Wikipedia.
+The input point coordinates are represented as two input @code{gal_data_t}s 
(@code{X} and @code{Y}, where @code{X->next=Y} and @code{Y->next=NULL}).
+If you had three dimensional points, you could define an extra 
@code{gal_data_t} such that @code{Y->next=Z} and @code{Z->next=NULL}.
+The output is always a list of two @code{gal_data_t}s, where the first one 
contains the index of the left sub-tree in the input, and the second one, the 
index of the right subtree.
+The index of the root node (@code{2} in the case below) is also returned as a 
single number.
+
+@example
+INDEX         INPUT              OUTPUT                 K-D Tree
+(as guide)    X --> Y        LEFT --> RIGHT           (visualized)
+----------    -------        --------------        ------------------
+0             5     4        1        4                   (7,2)
+1             2     3        BLANK    BLANK               /   \
+2             7     2        0        3               (5,4)    \
+3             9     6        5        BLANK           /   \    (9,6)
+4             4     7        BLANK    BLANK       (2,3) (4,7)  /
+5             8     1        BLANK    BLANK                   (8,1)
+@end example
+
+This format is therefore scalable to any number of dimensions: the number of 
dimensions are determined from the number of nodes in the input list of 
@code{gal_data_t}s (for example, using @code{gal_list_data_number}).
+In Gnuastro's k-d tree implementation, there are thus no special structures to 
keep every tree node (which would take extra memory and would need to be moved 
around as the tree is being created).
+Everything is done internally on the index of each point in the input dataset: 
the only thing that is flipped/sorted during tree creation is the index to the 
input row for any number of dimensions.
+As a result, Gnuastro's k-d tree implementation is very memory and CPU 
efficient and its two output columns can directly be written into a standrad 
table (without having to define any special binary format).
+
+@deftypefun {gal_data_t *} gal_kdtree_create (gal_data_t @code{*coords_raw}, 
size_t @code{*root})
+Create a k-d tree in a bottom-up manner (from leaves to the root).
+This function returns two @code{gal_data_t}s connected as a list, see 
description above.
+The first dataset contains the indexes of left and right nodes of the subtrees 
for each input node.
+The index of the root node is written into the memory that @code{root} points 
to.
+@code{coords_raw} is the list of the input points (one @code{gal_data_t} per 
dimension, see above).
+
+@example
+#include <stdio.h>
+#include <gnuastro/table.h>
+#include <gnuastro/kdtree.h>
+
+int
+main (void)
+@{
+  size_t root;
+  gal_data_t *input, *kdtree;
+  char kdtreefile[]="kd-tree.txt";    /* Inputs and outputs can */
+  char inputfile[]="coordinates.txt"; /* also be FITS tables.   */
+
+  /* Read the input table. Note: this assumes the table only
+   * contains your input point coordinates (one column for each
+   * dimension). If it contains more columns with other properties
+   * for each point, you can specify which columns to read by
+   * name or number, see the documentation of 'gal_table_read'. */
+  input=gal_table_read(inputfile, "1", NULL, NULL,
+                       GAL_TABLE_SEARCH_NAME, 0, -1, 0, NULL);
+
+  /* Construct a k-d tree. The index of root is stored in `root` */
+  kdtree=gal_kdtree_create(coords_raw, &root);
+
+  /* Write output to a file. */
+  gal_table_write(kdtree, NULL, GAL_TABLE_FORMAT_BFITS,
+                  kdtreefile, "kdtree", 0);
+
+  /* Report k-d tree root. */
+  printf("Root row of '%s': %zu\n", kdtreefile, root);
+
+  /* Clean up and return. */
+  gal_list_data_free(input);
+  gal_list_data_free(kdtree);
+  return EXIT_SUCCESS;
+@}
+@end example
+@end deftypefun
+
+@deftypefun size_t gal_kdtree_nearest_neighbour (gal_data_t 
@code{*coords_raw}, gal_data_t @code{*kdtree}, size_t @code{root}, double 
@code{*point}, double @code{*least_dist})
+Returns the index of the nearest input point to the query point (@code{point}, 
assumed to be an array with same number of elements as @code{gal_data_t}s in 
@code{coords_raw}).
+The distance between the query point and its nearest neighbor is stored in the 
space that @code{least_dist} points to.
+This search is efficient due to the constantly checking for the presence of 
possible best points in other brances.
+If it isn't possible for the other branch to have a better nearest neighbor, 
that branch is not searched.
+
+For example the function below reads the input, and the k-d tree file created 
in the example of @code{gal_kdtree_create} (so you won't have to re-create the 
k-d tree every time) and finds the input point that is closest to a given query 
point.
+
+@example
+#include <stdio.h>
+#include <gnuastro/table.h>
+#include <gnuastro/kdtree.h>
+
+int
+main (void)
+@{
+  size_t root=KDTREE_ROOT;
+  gal_data_t *input, *kdtree;
+  char kdtreefile[]="kd-tree.txt";    /* Inputs and outputs can */
+  char inputfile[]="coordinates.txt"; /* also be FITS tables.   */
+
+  double point[2]=@{9,8@};              /* assuming a 2-d tree.   */
+  size_t nearest_index;
+
+  /* Read the input coordinates, see comments in example of
+   * 'gal_kdtree_create' for more. */
+  input=gal_table_read(inputfile, "1", NULL, NULL,
+                       GAL_TABLE_SEARCH_NAME, 0, -1, 0, NULL);
+
+  /* Read the k-d tree (created before). */
+  kdtree=gal_table_read(kdtreefile, "1", NULL, NULL,
+                        GAL_TABLE_SEARCH_NAME, 0, -1, 0, NULL);
+
+  /* Find the nearest neighbour of the point. */
+  nearest_index=gal_kdtree_nearest_neighbour(input, kdtree, root,
+                                             point);
+
+  /* Clean up and return. */
+  gal_list_data_free(input);
+  gal_list_data_free(kdtree);
+  return EXIT_SUCCESS;
+@}
+@end example
+@end deftypefun
+
+
+
+
+
+@node Permutations, Matching, K-d tree, Gnuastro library
 @subsection Permutations (@file{permutation.h})
 @cindex permutation
 Permutation is the technical name for re-ordering of values. The need for
diff --git a/lib/Makefile.am b/lib/Makefile.am
index cbd4f0b..766bb23 100644
--- a/lib/Makefile.am
+++ b/lib/Makefile.am
@@ -67,9 +67,9 @@ libgnuastro_la_SOURCES = $(MAYBE_WCSDISTORTION) arithmetic.c \
   arithmetic-le.c arithmetic-lt.c arithmetic-minus.c arithmetic-modulo.c \
   arithmetic-multiply.c arithmetic-ne.c arithmetic-or.c arithmetic-plus.c \
   array.c binary.c blank.c box.c checkset.c convolve.c cosmology.c data.c \
-  eps.c fits.c git.c interpolate.c jpeg.c label.c list.c match.c options.c \
-  pdf.c permutation.c pointer.c polygon.c qsort.c dimension.c speclines.c \
-  statistics.c table.c tableintern.c threads.c tiff.c tile.c \
+  eps.c fits.c git.c interpolate.c jpeg.c kdtree.c label.c list.c match.c \
+  options.c pdf.c permutation.c pointer.c polygon.c qsort.c dimension.c \
+  speclines.c statistics.c table.c tableintern.c threads.c tiff.c tile.c \
   tile-internal.c timing.c txt.c type.c units.c wcs.c
 
 
@@ -85,10 +85,10 @@ pkginclude_HEADERS = gnuastro/config.h 
$(headersdir)/arithmetic.h \
   $(headersdir)/box.h $(headersdir)/convolve.h $(headersdir)/cosmology.h \
   $(headersdir)/data.h $(headersdir)/dimension.h $(headersdir)/eps.h \
   $(headersdir)/fits.h $(headersdir)/git.h $(headersdir)/interpolate.h \
-  $(headersdir)/jpeg.h $(headersdir)/label.h $(headersdir)/list.h \
-  $(headersdir)/match.h $(headersdir)/pdf.h $(headersdir)/permutation.h \
-  $(headersdir)/pointer.h $(headersdir)/polygon.h $(headersdir)/qsort.h \
-  $(headersdir)/speclines.h $(headersdir)/statistics.h \
+  $(headersdir)/jpeg.h $(headersdir)/jpeg.h $(headersdir)/label.h \
+  $(headersdir)/list.h $(headersdir)/match.h $(headersdir)/pdf.h \
+  $(headersdir)/permutation.h $(headersdir)/pointer.h $(headersdir)/polygon.h \
+  $(headersdir)/qsort.h $(headersdir)/speclines.h $(headersdir)/statistics.h \
   $(headersdir)/table.h $(headersdir)/threads.h $(headersdir)/tiff.h \
   $(headersdir)/tile.h $(headersdir)/txt.h $(headersdir)/type.h \
   $(headersdir)/units.h $(headersdir)/wcs.h
diff --git a/lib/gnuastro/kdtree.h b/lib/gnuastro/kdtree.h
new file mode 100644
index 0000000..2dd3530
--- /dev/null
+++ b/lib/gnuastro/kdtree.h
@@ -0,0 +1,62 @@
+/*********************************************************************
+kdtree -- Create kd-tree and nearest neighbour searches.
+This is part of GNU Astronomy Utilities (Gnuastro) package.
+
+Original author:
+     Sachin Kumar Singh <sachinkumarsingh092@gmail.com>
+Contributing author(s):
+     Mohammad Akhlaghi <mohammad@akhlaghi.org>
+Copyright (C) 2020, Free Software Foundation, Inc.
+
+Gnuastro is free software: you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation, either version 3 of the License, or (at your
+option) any later version.
+
+Gnuastro is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with Gnuastro. If not, see <http://www.gnu.org/licenses/>.
+**********************************************************************/
+#ifndef __GAL_KDTREE_H__
+#define __GAL_KDTREE_H__
+
+/* Include other headers if necessary here. Note that other header files
+   must be included before the C++ preparations below */
+#include <gnuastro/data.h>
+
+
+/* C++ Preparations */
+#undef __BEGIN_C_DECLS
+#undef __END_C_DECLS
+#ifdef __cplusplus
+# define __BEGIN_C_DECLS extern "C" {
+# define __END_C_DECLS }
+#else
+# define __BEGIN_C_DECLS                /* empty */
+# define __END_C_DECLS                  /* empty */
+#endif
+/* End of C++ preparations */
+
+
+
+/* Actual header contants (the above were for the Pre-processor). */
+__BEGIN_C_DECLS  /* From C++ preparations */
+
+
+
+gal_data_t *
+gal_kdtree_create(gal_data_t *coords_raw, size_t *root);
+
+size_t
+gal_kdtree_nearest_neighbour(gal_data_t *coords_raw, gal_data_t *kdtree,
+                             size_t root, double *point, double *least_dist);
+
+
+
+__END_C_DECLS    /* From C++ preparations */
+
+#endif           /* __GAL_KDTREE_H__ */
diff --git a/lib/kdtree.c b/lib/kdtree.c
new file mode 100644
index 0000000..edeff89
--- /dev/null
+++ b/lib/kdtree.c
@@ -0,0 +1,563 @@
+/*********************************************************************
+kdtree -- Create k-d tree and nearest neighbour searches.
+This is part of GNU Astronomy Utilities (Gnuastro) package.
+
+Original author:
+     Sachin Kumar Singh <sachinkumarsingh092@gmail.com>
+Contributing author(s):
+     Mohammad Akhlaghi <mohammad@akhlaghi.org>
+Copyright (C) 2020, Free Software Foundation, Inc.
+
+Gnuastro is free software: you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation, either version 3 of the License, or (at your
+option) any later version.
+
+Gnuastro is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with Gnuastro. If not, see <http://www.gnu.org/licenses/>.
+**********************************************************************/
+#include <config.h>
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <errno.h>
+#include <error.h>
+#include <float.h>
+
+#include <gnuastro/data.h>
+#include <gnuastro/table.h>
+#include <gnuastro/blank.h>
+#include <gnuastro/pointer.h>
+#include <gnuastro/permutation.h>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+/****************************************************************
+ ********                  Utilities                      *******
+ ****************************************************************/
+/* Main structure to keep kd-tree parameters. */
+struct kdtree_params
+{
+  size_t ndim;
+  size_t *input_row;
+  gal_data_t **coords;
+  uint32_t *left, *right;
+  gal_data_t *left_col, *right_col;
+};
+
+
+
+
+
+/* Swap 2 nodes of the tree. Instead of physically swaping all the values
+   we swap just the indexes of the node. */
+static void
+kdtree_node_swap(struct kdtree_params *p, size_t node1, size_t node2)
+{
+  uint32_t tmp_left=p->left[node1];
+  uint32_t tmp_right=p->right[node1];
+  size_t tmp_input_row=p->input_row[node1];
+
+  /* No need to swap same node. */
+  if(node1==node2) return;
+
+  // printf("left = %u, right = %u\n", tmp_left, tmp_right);
+  p->left[node1]=p->left[node2];
+  p->right[node1]=p->right[node2];
+  p->input_row[node1]=p->input_row[node2];
+
+  p->left[node2]=tmp_left;
+  p->right[node2]=tmp_right;
+  p->input_row[node2]=tmp_input_row;
+}
+
+
+
+
+
+/* Return the distance between 2 given nodes.  This distance is equivalent
+   to the radius of the hypersphere having `node` as the center.
+
+   Return:
+   Radial distace from given point to the node.
+*/
+static double
+kdtree_distance_find(struct kdtree_params *p, size_t node,
+                     double *point)
+{
+  size_t i;
+  double *carr;
+  double t_distance, node_distance=0;
+
+  /* For all dimensions. */
+  for(i=0; i<p->ndim; ++i)
+    {
+      carr=p->coords[i]->array;
+      t_distance=carr[node]-point[i];
+
+      node_distance += t_distance*t_distance;
+    }
+
+  return node_distance;
+}
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+/****************************************************************
+ ********           Preperations and Cleanup              *******
+ ****************************************************************/
+/* */
+static void
+kdtree_prepare(struct kdtree_params *p, gal_data_t *coords_raw)
+{
+  size_t i;
+  gal_data_t *tmp;
+  p->ndim=gal_list_data_number(coords_raw);
+
+  /* Allocate the coordinate array. */
+  errno=0;
+  p->coords=malloc(p->ndim*sizeof(**(p->coords)));
+  if(p->coords==NULL)
+    error(EXIT_FAILURE, errno, "%s: couldn't allocate %zu bytes "
+         "for 'coords'", __func__, p->ndim*sizeof(**(p->coords)));
+
+  /* Convert input to double type. */
+  tmp=coords_raw;
+  for(i=0; i<p->ndim; ++i)
+    {
+      if(tmp->type == GAL_TYPE_FLOAT64)
+       p->coords[i]=tmp;
+      else
+        p->coords[i]=gal_data_copy_to_new_type (tmp, GAL_TYPE_FLOAT64);
+
+      /* Go to the next column list. */
+      tmp=tmp->next;
+    }
+
+  /* If the 'left_col' is already defined, then we just need to do
+     some sanity checks. */
+  if(p->left_col)
+    {
+      /* Make sure there is more than one column. */
+      if(p->left_col->next==NULL)
+        error(EXIT_FAILURE, 0, "%s: the input kd-tree should be 2 columns",
+              __func__);
+
+      /* Set the right column and check if there aren't any
+         more columns. */
+      p->right_col=p->left_col->next;
+      if(p->right_col->next)
+        error(EXIT_FAILURE, 0, "%s: the input kd-tree shoudn't be more "
+              "than 2 columns", __func__);
+
+      /* Make sure they are the same size. */
+      if(p->left_col->size!=p->right_col->size)
+        error(EXIT_FAILURE, 0, "%s: left and right columns should have "
+              "same size", __func__);
+
+      /* Make sure left is 'uint32_t'. */
+      if(p->left_col->type!=GAL_TYPE_UINT32)
+        error(EXIT_FAILURE, 0, "%s: left kd-tree column should be uint32_t",
+              __func__);
+
+      /* Make sure right is 'uint32_t'. */
+      if(p->right_col->type!=GAL_TYPE_UINT32)
+        error(EXIT_FAILURE, 0, "%s: right kd-tree column should be uint32_t",
+              __func__);
+
+      /* Initailise left and right arrays. */
+      p->left=p->left_col->array;
+      p->right=p->right_col->array;
+    }
+  else
+    {
+      /* Allocate and initialise the kd-tree input_row. */
+      p->input_row=gal_pointer_allocate(GAL_TYPE_SIZE_T, coords_raw->size, 0,
+                                        __func__, "p->input_row");
+      for(i=0; i<coords_raw->size; ++i)        p->input_row[i]=i;
+
+      /* Allocate output and initialize them. */
+      p->left_col=gal_data_alloc(NULL, GAL_TYPE_UINT32, 1,
+                                 coords_raw->dsize, NULL, 0,
+                                 coords_raw->minmapsize,
+                                 coords_raw->quietmmap, "left",
+                                 "index",
+                                 "index of left subtree in the kd-tree");
+      p->right_col=gal_data_alloc(NULL, GAL_TYPE_UINT32, 1,
+                                  coords_raw->dsize, NULL, 0,
+                                  coords_raw->minmapsize,
+                                  coords_raw->quietmmap, "right",
+                                  "index",
+                                  "index of right subtree in the kd-tree");
+
+      /* Fill the elements of the params structure. */
+      p->left_col->next=p->right_col;
+
+      /* Initialise the left and right arrays. */
+      p->left=p->left_col->array;
+      p->right=p->right_col->array;
+      for(i=0;i<coords_raw->size;++i)
+        { p->left[i]=p->right[i]=GAL_BLANK_UINT32; }
+    }
+}
+
+
+
+
+
+/* Cleanup the data and free the memory used by the structure after use. */
+static void
+kdtree_cleanup(struct kdtree_params *p, gal_data_t *coords_raw)
+{
+  size_t i;
+  gal_data_t *tmp;
+
+  /* Clean up. */
+  tmp=coords_raw;
+  for(i=0; i<p->ndim; ++i)
+    {
+      if(p->coords[i]!=tmp) gal_data_free(p->coords[i]);
+      tmp=tmp->next;
+    }
+
+  /* Free memory. */
+  free(p->coords);
+  free(p->input_row);
+}
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+/****************************************************************
+ ********                Create KD-Tree                   *******
+ ****************************************************************/
+/* Find the median to seperate the hyperspace. Instead of randomly chossing
+   the media-point, we use `quickselect alogorithm` to find median in
+   average complexity of O(n). This also makes nodes partially sorted w.r.t
+   each axis.  See `https://en.wikipedia.org/wiki/Quickselect` for
+   pseudocode and more information of the algorithm.
+
+   Return : median point(node) that splits the hyperspace. */
+static size_t
+kdtree_median_find(struct kdtree_params *p, size_t node_left,
+                   size_t node_right, double *coordinate)
+{
+  double pivot_value;
+  size_t i, store_i, node_pivot, node_k;
+
+  /* False state, this is a programming error. */
+  if (node_right < node_left)
+    error(EXIT_FAILURE, 0, "%s: a bug! Please contact us to fix "
+          "the problem! For some reason, the node_right (%zu) is "
+          "smaller than node_left (%zu)", __func__, node_right,
+          node_left);
+
+  /* If the two nodes are the same, just return the node. */
+  if (node_right == node_left)
+    error(EXIT_FAILURE, 0, "%s: a bug! Please contact us to fix "
+          "the problem! For some reason, the node_right (%zu) is "
+          "equal to node_left (%zu)", __func__, node_right, node_left);
+
+  /* The middle value (here used as pivot) between node_left
+     and node_right. */
+  node_k=node_left+(node_right-node_left)/2;
+
+  /* Loop until the median (for the current axis) is returned (and in
+     the mean-while, the underlying indexs are sorted). */
+  while(1)
+    {
+      /* In every run, 'node_left' and 'node_right' change, so the
+         pivot index and pivot value change. */
+      node_pivot = node_left+(node_right-node_left)/2;
+      pivot_value = coordinate[p->input_row[node_pivot]];
+
+      /* Move the pivot_value to the right/larger end. */
+      kdtree_node_swap(p, node_pivot, node_right);
+
+      /* Move all nodes smaller than pivot value to the left/smaller
+         side of the nodes. */
+      store_i=node_left;
+      for (i = node_left; i < node_right; ++i)
+        if (coordinate[p->input_row[i]] < pivot_value)
+          {
+            /* Move ith-node to the left/smaller side,
+               and increment store_i */
+            kdtree_node_swap(p, store_i, i);
+
+            /* Prepare the place of next smaller node. */
+            store_i++;
+          }
+
+      /* Move pivot, to be just after of all the nodes that are less
+         than it (because pivot was moved to 'node_right'). */
+      kdtree_node_swap(p, node_right, store_i);
+
+      /* Set node_pivot to be the store_i. */
+      node_pivot=store_i;
+
+      /* If median is found, break the loop and return the node_k. */
+      if (node_k == node_pivot) break;
+
+      /* Change the left or right node based on the position of node_pivot
+         so we can continue the search/sort in the loop. */
+      if (node_k < node_pivot) node_right = node_pivot - 1;
+      else                     node_left  = node_pivot + 1;
+    }
+
+  /* Return the pivot node. */
+  return node_pivot;
+}
+
+
+
+
+
+
+/* Make a kd-tree from a given set of points. For tree construction, a
+   median point is selected for each axis and the left and right branches
+   are made by comparing points based on that axis.
+
+   Return : a balanced kd-tree.
+*/
+static uint32_t
+kdtree_fill_subtrees(struct kdtree_params *p, size_t node_left,
+                     size_t node_right, size_t depth)
+{
+  /* Set the working axis. */
+  size_t axis=depth % p->ndim;
+
+  /* node_median is a counter over the `input_row` array.
+     `input_row` array has the input_row(row number). */
+  size_t node_median;
+
+  /* Recursion terminates when the left and right nodes are the
+     same. */
+  if(node_left==node_right) return p->input_row[node_left];
+
+  /* Find the median node. */
+  node_median=kdtree_median_find(p, node_left, node_right,
+                                 p->coords[axis]->array);
+
+  /* node_median == 0 : We are in the lowest node (leaf) so no need
+     to continue seachin recursively.
+     When we only have 2 nodes and the median is equal to the left,
+     its the end of the subtree.
+  */
+  if(node_median)
+    p->left[node_median] = ( (node_median == node_left)
+                             ? GAL_BLANK_UINT32
+                             : kdtree_fill_subtrees(p, node_left,
+                                                    node_median-1,
+                                                    depth+1) );
+
+  /* Right and left nodes are non-symytrical. Node left can be equal
+     to node median when there are only 2 points and at this point,
+     there can never be a single point (node left == node right).
+     But node right can never be equal to node median.
+     So we don't check for it.*/
+  p->right[node_median] = kdtree_fill_subtrees(p, node_median+1,
+                                               node_right, depth+1);
+
+  /* All subtrees have been parsed, return the node. */
+  return p->input_row[node_median];
+}
+
+
+
+
+
+/* High level function to construct the kd-tree. This function initilises
+   and creates the tree in top-down manner. Returns a list containing the
+   indexes of left and right subtrees. */
+gal_data_t *
+gal_kdtree_create(gal_data_t *coords_raw, size_t *root)
+{
+  struct kdtree_params p={0};
+
+  /* Initialise the params structure. */
+  kdtree_prepare(&p, coords_raw);
+
+  /* Fill the kd-tree*/
+  *root=kdtree_fill_subtrees(&p, 0, coords_raw->size-1, 0);
+
+  /* For a check
+  size_t i;
+  for(i=0;i<coords_raw->size;++i)
+    printf("%-15zu%-15u%-15u\n", p.input_row[i], p.left[i], p.right[i]);
+  */
+
+  /* Do a reverse permutation to sort the indexes
+     back in the input order. */
+  gal_permutation_apply_inverse (p.left_col, p.input_row);
+  gal_permutation_apply_inverse (p.right_col, p.input_row);
+
+  /* Free and clean up */
+  kdtree_cleanup(&p, coords_raw);
+
+  /* Return results. */
+  return p.left_col;
+}
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+/****************************************************************
+ ********          Nearest-Neighbour Search               *******
+ ****************************************************************/
+/* Find the nearest neighbour of the `point`.  See
+   `https://en.wikipedia.org/wiki/K-d_tree#Nearest_neighbour_search` for
+   more information. */
+static void
+kdtree_nearest_neighbour(struct kdtree_params *p, uint32_t node_current,
+                         double *point, double *least_dist,
+                         size_t *out_nn, size_t depth)
+{
+  double d, dx, dx2;
+  size_t axis=depth % p->ndim;    /* Set the working axis. */
+  double *coordinates=p->coords[axis]->array;
+
+  /* If no subtree present, don't search further. */
+  if(node_current==GAL_BLANK_UINT32) return;
+
+  /* The distance between search point to the current node.*/
+  d = kdtree_distance_find(p, node_current, point);
+
+  /* Distance between the splitting coordinate of the search
+     point and current node. */
+  dx = coordinates[node_current]-point[axis];
+
+  /* Check if the current node is nearer than the previous
+     nearest node. */
+  if(d < *least_dist)
+    {
+      *least_dist = d;
+      *out_nn = node_current;
+    }
+
+  /* If exact match found(least distance 0), return it. */
+  if(*least_dist==0.0f) return;
+
+  /* Recursively search in subtrees. */
+  kdtree_nearest_neighbour(p, dx > 0
+                              ? p->left[node_current]
+                              : p->right[node_current],
+                           point, least_dist, out_nn, depth+1);
+
+  /* Since the hyperplanes are all axis-aligned, to check if there is a
+     node in other branch which is nearer to the current node is done by a
+     simple comparison to see whether the distance between the splitting
+     coordinate (median node) of the search point and current node is
+     lesser (i.e on same side of hyperplane) than the distance (overall
+     coordinates) from the search point to the current nearest. */
+  dx2 = dx*dx;
+  if(dx2 >= *least_dist) return;
+
+  /* Recursively search other subtrees. */
+  kdtree_nearest_neighbour(p, dx > 0
+                              ? p->right[node_current]
+                              : p->left[node_current],
+                           point, least_dist, out_nn, depth+1);
+}
+
+
+
+
+
+/* High-level function used to find the nearest neighbour of from a given
+   point. Returns the index of the nearest neighbour in the kd-tree. */
+size_t
+gal_kdtree_nearest_neighbour(gal_data_t *coords_raw, gal_data_t *kdtree,
+                             size_t root, double *point,
+                             double *least_dist)
+{
+  struct kdtree_params p={0};
+  size_t out_nn=GAL_BLANK_SIZE_T;
+
+  /* Initialisation. */
+  p.left_col=kdtree;
+  *least_dist=DBL_MAX;
+  kdtree_prepare(&p, coords_raw);
+
+  /* Use the low-level function to find th nearest neighbour. */
+  kdtree_nearest_neighbour(&p, root, point, least_dist, &out_nn, 0);
+
+  /* For a check
+  printf("%s: root=%zu, out_nn=%zu, least_dis=%f\n",
+         __func__, root, out_nn, least_dist);
+  */
+
+  /* Clean up and return. */
+  kdtree_cleanup(&p, coords_raw);
+  return out_nn;
+}



reply via email to

[Prev in Thread] Current Thread [Next in Thread]