Re: [PATCH v2] migration/calc-dirty-rate: millisecond-granularity period

Hi, Andrei. I'm preparing a patchset for a pull request.

For this patch, would you mind if I?

1. Generate a single patch from the optimization partition.
2. In the patch above, include my comment and my "Reviewed-by".
3. Add it to the queue with other patchsets.

Then you can improve on top at your leisure, Would that work for you?

On Wed, Aug 9, 2023 at 11:03 PM <gudkov.andrei@huawei.com> wrote:

On Sun, Aug 06, 2023 at 02:16:34PM +0800, Yong Huang wrote:
> On Fri, Aug 4, 2023 at 11:03 PM Andrei Gudkov <gudkov.andrei@huawei.com>
> wrote:
>
> > Introduces alternative argument calc-time-ms, which is the
> > the same as calc-time but accepts millisecond value.
> > Millisecond granularity allows to make predictions whether
> > migration will succeed or not. To do this, calculate dirty
> > rate with calc-time-ms set to max allowed downtime, convert
> > measured rate into volume of dirtied memory, and divide by
> > network throughput. If the value is lower than max allowed
> >
> Not for the patch, I'm just curious about how the predication
> decides the network throughput, I mean QEMU predicts
> if migration will converge based on how fast it sends the data,
> not the actual bandwidth of the interface, which one the
> prediction uses?
>
Currently I use network nominal bandwidth, e.g. 1gbps. It would
be nice, of course, to use measured throughput since it can take
into account network headers overhead (as Wang Lei previously
mentioned), compression, etc., but it looks too complicated to
implement outside of migration process.

> > downtime, then migration will converge.
> >
> > Measurement results for single thread randomly writing to
> > a 1/4/24GiB memory region:
> >
> > +--------------+-----------------------------------------------+
> > | calc-time-ms | dirty rate MiB/s |
> > | +----------------+---------------+--------------+
> > | | theoretical | page-sampling | dirty-bitmap |
> > | | (at 3M wr/sec) | | |
> > +--------------+----------------+---------------+--------------+
> > | 1GiB |
> > +--------------+----------------+---------------+--------------+
> > | 100 | 6996 | 7100 | 3192 |
> > | 200 | 4606 | 4660 | 2655 |
> > | 300 | 3305 | 3280 | 2371 |
> > | 400 | 2534 | 2525 | 2154 |
> > | 500 | 2041 | 2044 | 1871 |
> > | 750 | 1365 | 1341 | 1358 |
> > | 1000 | 1024 | 1052 | 1025 |
> > | 1500 | 683 | 678 | 684 |
> > | 2000 | 512 | 507 | 513 |
> > +--------------+----------------+---------------+--------------+
> > | 4GiB |
> > +--------------+----------------+---------------+--------------+
> > | 100 | 10232 | 8880 | 4070 |
> > | 200 | 8954 | 8049 | 3195 |
> > | 300 | 7889 | 7193 | 2881 |
> > | 400 | 6996 | 6530 | 2700 |
> > | 500 | 6245 | 5772 | 2312 |
> > | 750 | 4829 | 4586 | 2465 |
> > | 1000 | 3865 | 3780 | 2178 |
> > | 1500 | 2694 | 2633 | 2004 |
> > | 2000 | 2041 | 2031 | 1789 |
> > +--------------+----------------+---------------+--------------+
> > | 24GiB |
> > +--------------+----------------+---------------+--------------+
> > | 100 | 11495 | 8640 | 5597 |
> > | 200 | 11226 | 8616 | 3527 |
> > | 300 | 10965 | 8386 | 2355 |
> > | 400 | 10713 | 8370 | 2179 |
> > | 500 | 10469 | 8196 | 2098 |
> > | 750 | 9890 | 7885 | 2556 |
> > | 1000 | 9354 | 7506 | 2084 |
> > | 1500 | 8397 | 6944 | 2075 |
> > | 2000 | 7574 | 6402 | 2062 |
> > +--------------+----------------+---------------+--------------+
> >
> > Theoretical values are computed according to the following formula:
> > size * (1 - (1-(4096/size))^(time*wps)) / (time * 2^20),
> > where size is in bytes, time is in seconds, and wps is number of
> > writes per second.
> >
> > Signed-off-by: Andrei Gudkov <gudkov.andrei@huawei.com>
> > ---
> > qapi/migration.json | 14 ++++++--
> > migration/dirtyrate.h | 12 ++++---
> > migration/dirtyrate.c | 81 +++++++++++++++++++++++++------------------
> > 3 files changed, 67 insertions(+), 40 deletions(-)
> >
> > [...]
>
> > diff --git a/qapi/migration.json b/qapi/migration.json
> > index 8843e74b59..82493d6a57 100644
> > --- a/qapi/migration.json
> > +++ b/qapi/migration.json
> > @@ -1849,7 +1849,11 @@
> > # @start-time: start time in units of second for calculation
> > #
> > # @calc-time: time period for which dirty page rate was measured
> > -# (in seconds)
> > +# (rounded down to seconds).
> >
> Does there need an extra comment to emphasize that calc-time shows
> zero if the calc-time-ms is lower than 1000?
>
> > +#
> > +# @calc-time-ms: actual time period for which dirty page rate was
> > +# measured (in milliseconds). Value may be larger than requested
> > +# time period due to measurement overhead.
> > #
> > # @sample-pages: number of sampled pages per GiB of guest memory.
> > # Valid only in page-sampling mode (Since 6.1)
> > @@ -1866,6 +1870,7 @@
> > 'status': 'DirtyRateStatus',
> > 'start-time': 'int64',
> > 'calc-time': 'int64',
> > + 'calc-time-ms': 'int64',
> > 'sample-pages': 'uint64',
> > 'mode': 'DirtyRateMeasureMode',
> > '*vcpu-dirty-rate': [ 'DirtyRateVcpu' ] } }
> > @@ -1908,6 +1913,10 @@
> > # dirty during @calc-time period, further writes to this page will
> > # not increase dirty page rate anymore.
> > #
> > +# @calc-time-ms: the same as @calc-time but in milliseconds. These
> > +# two arguments are mutually exclusive. Exactly one of them must
> > +# be specified. (Since 8.1)
> > +#
> > # @sample-pages: number of sampled pages per each GiB of guest memory.
> > # Default value is 512. For 4KiB guest pages this corresponds to
> > # sampling ratio of 0.2%. This argument is used only in page
> > @@ -1925,7 +1934,8 @@
> > # 'sample-pages': 512} }
> > # <- { "return": {} }
> > ##
> > -{ 'command': 'calc-dirty-rate', 'data': {'calc-time': 'int64',
> > +{ 'command': 'calc-dirty-rate', 'data': {'*calc-time': 'int64',
> > + '*calc-time-ms': 'int64',
> > '*sample-pages': 'int',
> > '*mode': 'DirtyRateMeasureMode'}
> > }
> >
> > diff --git a/migration/dirtyrate.h b/migration/dirtyrate.h
> > index 594a5c0bb6..869c060941 100644
> > --- a/migration/dirtyrate.h
> > +++ b/migration/dirtyrate.h
> > @@ -31,10 +31,12 @@
> > #define MIN_RAMBLOCK_SIZE 128
> >
> > /*
> > - * Take 1s as minimum time for calculation duration
> > + * Allowed range for dirty page rate calculation (in milliseconds).
> > + * Lower limit relates to the smallest realistic downtime it
> > + * makes sense to impose on migration.
> > */
> > -#define MIN_FETCH_DIRTYRATE_TIME_SEC 1
> > -#define MAX_FETCH_DIRTYRATE_TIME_SEC 60
> > +#define MIN_CALC_TIME_MS 50
> > +#define MAX_CALC_TIME_MS 60000
> >
> > /*
> > * Take 1/16 pages in 1G as the maxmum sample page count
> > @@ -44,7 +46,7 @@
> >
> > struct DirtyRateConfig {
> > uint64_t sample_pages_per_gigabytes; /* sample pages per GB */
> > - int64_t sample_period_seconds; /* time duration between two sampling
> > */
> > + int64_t calc_time_ms; /* desired calculation time (in milliseconds) */
> > DirtyRateMeasureMode mode; /* mode of dirtyrate measurement */
> > };
> >
> > @@ -73,7 +75,7 @@ typedef struct SampleVMStat {
> > struct DirtyRateStat {
> > int64_t dirty_rate; /* dirty rate in MB/s */
> > int64_t start_time; /* calculation start time in units of second */
> > - int64_t calc_time; /* time duration of two sampling in units of
> > second */
> > + int64_t calc_time_ms; /* actual calculation time (in milliseconds) */
> > uint64_t sample_pages; /* sample pages per GB */
> > union {
> > SampleVMStat page_sampling;
>
> diff --git a/migration/dirtyrate.c b/migration/dirtyrate.c
> > index 84f1b0fb20..90fb336329 100644
> > --- a/migration/dirtyrate.c
> > +++ b/migration/dirtyrate.c
> > @@ -57,6 +57,8 @@ static int64_t dirty_stat_wait(int64_t msec, int64_t
> > initial_time)
> > msec = current_time - initial_time;
> > } else {
> > g_usleep((msec + initial_time - current_time) * 1000);
> > + /* g_usleep may overshoot */
> > + msec = qemu_clock_get_ms(QEMU_CLOCK_REALTIME) - initial_time;
> >
> The optimization could be a standalone commit along with the content
> below(see the following comment)?
>
OK, I will move it into separate commit.

> > }
> >
> > return msec;
> > @@ -77,9 +79,12 @@ static int64_t do_calculate_dirtyrate(DirtyPageRecord
> > dirty_pages,
> > {
> > uint64_t increased_dirty_pages =
> > dirty_pages.end_pages - dirty_pages.start_pages;
> > - uint64_t memory_size_MiB =
> > qemu_target_pages_to_MiB(increased_dirty_pages);
> > -
> > - return memory_size_MiB * 1000 / calc_time_ms;
> > + /*
> > + * multiply by 1000ms/s _before_ converting down to megabytes
> > + * to avoid losing precision
> > + */
> > + return qemu_target_pages_to_MiB(increased_dirty_pages * 1000) /
> > + calc_time_ms;
> >
> Code optimization, could be in a standalone commit.
>
OK, but note that it is an important optimization. Imagine that
calc_time_ms=100 and increased_dirty_pages=1000. If we compute without
this optimization, we will get only 1000/(2^8)*1000/100=30. With
optmization: 1000*1000/(2^8)/100=39.

> > }
> >
> > void global_dirty_log_change(unsigned int flag, bool start)
> > @@ -183,10 +188,9 @@ retry:
> > return duration;
> > }
> >
> > -static bool is_sample_period_valid(int64_t sec)
> > +static bool is_calc_time_valid(int64_t msec)
> > {
> > - if (sec < MIN_FETCH_DIRTYRATE_TIME_SEC ||
> > - sec > MAX_FETCH_DIRTYRATE_TIME_SEC) {
> > + if ((msec < MIN_CALC_TIME_MS) || (msec > MAX_CALC_TIME_MS)) {
> > return false;
> > }
> >
> > @@ -219,7 +223,8 @@ static struct DirtyRateInfo
> > *query_dirty_rate_info(void)
> >
> > info->status = CalculatingState;
> > info->start_time = DirtyStat.start_time;
> > - info->calc_time = DirtyStat.calc_time;
> > + info->calc_time_ms = DirtyStat.calc_time_ms;
> > + info->calc_time = DirtyStat.calc_time_ms / 1000;
> > info->sample_pages = DirtyStat.sample_pages;
> > info->mode = dirtyrate_mode;
> >
> > @@ -258,7 +263,7 @@ static void init_dirtyrate_stat(int64_t start_time,
> > {
> > DirtyStat.dirty_rate = -1;
> > DirtyStat.start_time = start_time;
> > - DirtyStat.calc_time = config.sample_period_seconds;
> > + DirtyStat.calc_time_ms = config.calc_time_ms;
> > DirtyStat.sample_pages = config.sample_pages_per_gigabytes;
> >
> > switch (config.mode) {
> > @@ -568,7 +573,6 @@ static inline void dirtyrate_manual_reset_protect(void)
> >
> > static void calculate_dirtyrate_dirty_bitmap(struct DirtyRateConfig
> > config)
> > {
> > - int64_t msec = 0;
> > int64_t start_time;
> > DirtyPageRecord dirty_pages;
> >
> > @@ -596,9 +600,7 @@ static void calculate_dirtyrate_dirty_bitmap(struct
> > DirtyRateConfig config)
> > start_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> > DirtyStat.start_time = start_time / 1000;
> >
> > - msec = config.sample_period_seconds * 1000;
> > - msec = dirty_stat_wait(msec, start_time);
> > - DirtyStat.calc_time = msec / 1000;
> > + DirtyStat.calc_time_ms = dirty_stat_wait(config.calc_time_ms,
> > start_time);
> >
> > /*
> > * do two things.
> > @@ -609,12 +611,12 @@ static void calculate_dirtyrate_dirty_bitmap(struct
> > DirtyRateConfig config)
> >
> > record_dirtypages_bitmap(&dirty_pages, false);
> >
> > - DirtyStat.dirty_rate = do_calculate_dirtyrate(dirty_pages, msec);
> > + DirtyStat.dirty_rate = do_calculate_dirtyrate(dirty_pages,
> > + DirtyStat.calc_time_ms);
> > }
> >
> > static void calculate_dirtyrate_dirty_ring(struct DirtyRateConfig config)
> > {
> > - int64_t duration;
> > uint64_t dirtyrate = 0;
> > uint64_t dirtyrate_sum = 0;
> > int i = 0;
> > @@ -625,12 +627,10 @@ static void calculate_dirtyrate_dirty_ring(struct
> > DirtyRateConfig config)
> > DirtyStat.start_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME) / 1000;
> >
> > /* calculate vcpu dirtyrate */
> > - duration = vcpu_calculate_dirtyrate(config.sample_period_seconds *
> > 1000,
> > - &DirtyStat.dirty_ring,
> > - GLOBAL_DIRTY_DIRTY_RATE,
> > - true);
> > -
> > - DirtyStat.calc_time = duration / 1000;
> > + DirtyStat.calc_time_ms = vcpu_calculate_dirtyrate(config.calc_time_ms,
> > +
> > &DirtyStat.dirty_ring,
> > +
> > GLOBAL_DIRTY_DIRTY_RATE,
> > + true);
> >
> > /* calculate vm dirtyrate */
> > for (i = 0; i < DirtyStat.dirty_ring.nvcpu; i++) {
> > @@ -646,7 +646,6 @@ static void calculate_dirtyrate_sample_vm(struct
> > DirtyRateConfig config)
> > {
> > struct RamblockDirtyInfo *block_dinfo = NULL;
> > int block_count = 0;
> > - int64_t msec = 0;
> > int64_t initial_time;
> >
> > rcu_read_lock();
> > @@ -656,17 +655,16 @@ static void calculate_dirtyrate_sample_vm(struct
> > DirtyRateConfig config)
> > }
> > rcu_read_unlock();
> >
> > - msec = config.sample_period_seconds * 1000;
> > - msec = dirty_stat_wait(msec, initial_time);
> > + DirtyStat.calc_time_ms = dirty_stat_wait(config.calc_time_ms,
> > + initial_time);
> > DirtyStat.start_time = initial_time / 1000;
> > - DirtyStat.calc_time = msec / 1000;
> >
> > rcu_read_lock();
> > if (!compare_page_hash_info(block_dinfo, block_count)) {
> > goto out;
> > }
> >
> > - update_dirtyrate(msec);
> > + update_dirtyrate(DirtyStat.calc_time_ms);
> >
> > out:
> > rcu_read_unlock();
> > @@ -711,7 +709,10 @@ void *get_dirtyrate_thread(void *arg)
> > return NULL;
> > }
> >
> > -void qmp_calc_dirty_rate(int64_t calc_time,
> > +void qmp_calc_dirty_rate(bool has_calc_time,
> > + int64_t calc_time,
> > + bool has_calc_time_ms,
> > + int64_t calc_time_ms,
> > bool has_sample_pages,
> > int64_t sample_pages,
> > bool has_mode,
> > @@ -731,10 +732,21 @@ void qmp_calc_dirty_rate(int64_t calc_time,
> > return;
> > }
> >
> > - if (!is_sample_period_valid(calc_time)) {
> > - error_setg(errp, "calc-time is out of range[%d, %d].",
> > - MIN_FETCH_DIRTYRATE_TIME_SEC,
> > - MAX_FETCH_DIRTYRATE_TIME_SEC);
> > + if ((int)has_calc_time + (int)has_calc_time_ms != 1) {
> > + error_setg(errp, "Exactly one of calc-time and calc-time-ms must"
> > + " be specified");
> > + return;
> > + }
> > + if (has_calc_time) {
> > + /*
> > + * The worst thing that can happen due to overflow is that
> > + * invalid value will become valid.
> > + */
> > + calc_time_ms = calc_time * 1000;
> > + }
> > + if (!is_calc_time_valid(calc_time_ms)) {
> > + error_setg(errp, "Calculation time is out of range[%dms, %dms].",
> > + MIN_CALC_TIME_MS, MAX_CALC_TIME_MS);
> > return;
> > }
> >
> > @@ -781,7 +793,7 @@ void qmp_calc_dirty_rate(int64_t calc_time,
> > return;
> > }
> >
> > - config.sample_period_seconds = calc_time;
> > + config.calc_time_ms = calc_time_ms;
> > config.sample_pages_per_gigabytes = sample_pages;
> > config.mode = mode;
> >
> > @@ -867,8 +879,11 @@ void hmp_calc_dirty_rate(Monitor *mon, const QDict
> > *qdict)
> > mode = DIRTY_RATE_MEASURE_MODE_DIRTY_RING;
> > }
> >
> > - qmp_calc_dirty_rate(sec, has_sample_pages, sample_pages, true,
> > - mode, &err);
> > + qmp_calc_dirty_rate(true, sec, /* calc_time */
> > + false, 0, /* calc_time_ms */
> > + has_sample_pages, sample_pages,
> > + true, mode,
> > + &err);
> > if (err) {
> > hmp_handle_error(mon, err);
> > return;
> > --
> > 2.30.2
> >
> > The patch set works for me, and I'm inclined to split it into two commits
> as I point out above.
>
> Thanks
>
> Yong
>
> --
> Best regards

Best regards

From:	Yong Huang
Subject:	Re: [PATCH v2] migration/calc-dirty-rate: millisecond-granularity period
Date:	Sat, 26 Aug 2023 16:32:01 +0800