[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-ppc] [PATCH v5] spapr: Add support for time base offset migrat
From: |
Alexey Kardashevskiy |
Subject: |
Re: [Qemu-ppc] [PATCH v5] spapr: Add support for time base offset migration |
Date: |
Mon, 14 Apr 2014 18:10:39 +1000 |
User-agent: |
Mozilla/5.0 (X11; Linux i686 on x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.4.0 |
On 04/14/2014 05:46 PM, Alexander Graf wrote:
>
> On 14.04.14 09:33, Alexey Kardashevskiy wrote:
>> On 04/14/2014 05:08 PM, Alexander Graf wrote:
>>> On 12.04.14 17:52, Alexey Kardashevskiy wrote:
>>>> This allows guests to have a different timebase origin from the host.
>>>>
>>>> This is needed for migration, where a guest can migrate from one host
>>>> to another and the two hosts might have a different timebase origin.
>>>> However, the timebase seen by the guest must not go backwards, and
>>>> should go forwards only by a small amount corresponding to the time
>>>> taken for the migration.
>>>>
>>>> This is only supported for recent POWER hardware which has the TBU40
>>>> (timebase upper 40 bits) register. That includes POWER6, 7, 8 but not
>>>> 970.
>>>>
>>>> This adds kvm_access_one_reg() to access a special register which is not
>>>> in env->spr.
>>>>
>>>> The feature must be present in the host kernel.
>>>>
>>>> Signed-off-by: Alexey Kardashevskiy <address@hidden>
>>>> ---
>>>> Changes:
>>>> v5:
>>>> * fixed multiple comments in cpu_ppc_get_adjusted_tb and merged it
>>>> into timebase_post_load()
>>>> * removed round_up(1<<24) as KVM is expected to do this anyway
>>>> * removed @freq from migration stream
>>>> * renamed PPCTimebaseOffset to PPCTimebase
>>>> * CLOCKS_PER_SEC is used as a constant which 1000000us/s (man clock)
>>>>
>>>> v4:
>>>> * made it per machine timebase offser rather than per CPU
>>>>
>>>> v3:
>>>> * kvm_access_one_reg moved out to a separate patch
>>>> * tb_offset and host_timebase were replaced with guest_timebase as
>>>> the destionation does not really care of offset on the source
>>>>
>>>> v2:
>>>> * bumped the vmstate_ppc_cpu version
>>>> * defined version for the env.tb_env field
>>>> ---
>>>> hw/ppc/ppc.c | 76
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++
>>>> hw/ppc/spapr.c | 4 +--
>>>> include/hw/ppc/spapr.h | 1 +
>>>> target-ppc/cpu-qom.h | 15 ++++++++++
>>>> target-ppc/kvm.c | 5 ++++
>>>> trace-events | 3 ++
>>>> 6 files changed, 102 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/hw/ppc/ppc.c b/hw/ppc/ppc.c
>>>> index 71df471..bf61a0a 100644
>>>> --- a/hw/ppc/ppc.c
>>>> +++ b/hw/ppc/ppc.c
>>>> @@ -29,9 +29,11 @@
>>>> #include "sysemu/cpus.h"
>>>> #include "hw/timer/m48t59.h"
>>>> #include "qemu/log.h"
>>>> +#include "qemu/error-report.h"
>>>> #include "hw/loader.h"
>>>> #include "sysemu/kvm.h"
>>>> #include "kvm_ppc.h"
>>>> +#include "trace.h"
>>>> //#define PPC_DEBUG_IRQ
>>>> //#define PPC_DEBUG_TB
>>>> @@ -829,6 +831,80 @@ static void cpu_ppc_set_tb_clk (void *opaque,
>>>> uint32_t freq)
>>>> cpu_ppc_store_purr(cpu, 0x0000000000000000ULL);
>>>> }
>>>> +static void timebase_pre_save(void *opaque)
>>>> +{
>>>> + PPCTimebase *tb = opaque;
>>>> + uint64_t ticks = cpu_get_real_ticks();
>>>> + PowerPCCPU *first_ppc_cpu = POWERPC_CPU(first_cpu);
>>>> +
>>>> + if (!first_ppc_cpu->env.tb_env) {
>>>> + error_report("No timebase object");
>>>> + return;
>>>> + }
>>>> +
>>>> + tb->time_of_the_day = get_clock_realtime() / 1000;
>>> It'd be good to indicate in the field name that we're in us units. In fact
>>> it probably makes sense to use ns throughout and not convert :).
>>> Nanoseconds are QEMU's "native" internal time format.
>> Ooookay, I'll make it "ns", then no indicator is needed, right?
>
> I still prefer to keep an indicator. Makes it more readable.
>
>>
>>
>>>> + /*
>>>> + * tb_offset is only expected to be changed by migration so
>>>> + * there is no need to update it from KVM here
>>>> + */
>>>> + tb->guest_timebase = ticks + first_ppc_cpu->env.tb_env->tb_offset;
>>>> +}
>>>> +
>>>> +static int timebase_post_load(void *opaque, int version_id)
>>>> +{
>>>> + PPCTimebase *tb = opaque;
>>>> + CPUState *cpu;
>>>> + PowerPCCPU *first_ppc_cpu = POWERPC_CPU(first_cpu);
>>>> + int64_t tb_off_adj, tb_off;
>>>> + int64_t migration_duration_us, migration_duration_tb, guest_tb,
>>>> host_us;
>>>> + unsigned long freq;
>>>> +
>>>> + if (!first_ppc_cpu->env.tb_env) {
>>>> + error_report("No timebase object");
>>>> + return -1;
>>>> + }
>>>> +
>>>> + freq = first_ppc_cpu->env.tb_env->tb_freq;
>>>> + /*
>>>> + * Calculate timebase on the destination side of migration.
>>>> + * The destination timebase must be not less than the source
>>>> timebase.
>>>> + * We try to adjust timebase by downtime if host clocks are not
>>>> + * too much out of sync (1 second for now).
>>>> + */
>>>> + host_us = get_clock_realtime() / 1000;
>>>> + migration_duration_us = MIN(CLOCKS_PER_SEC, host_us -
>>>> tb->time_of_the_day);
>>> CLOCKS_PER_SEC only make sense with the return value of clock(). I wouldn't
>>> be surprised if this calculation goes totally bonkers on BSD or Windows
>>> systems.
>>
>> http://linux.die.net/man/3/clock
>> C89, C99, POSIX.1-2001. POSIX requires that CLOCKS_PER_SEC equals 1000000
>> independent of the actual resolution.
>>
>> This should be valid for BSD too. Ok, this has no effect on Windows.
>> Replace definition with get_ticks_per_sec()? Defining USEC_PER_SEC (or
>> NSEC_xxx) does not make much sense.
>
> I'd just define NSEC_PER_SEC.
>
>>
>>> I still don't understand why we're doing a MIN() here. I would understand a
>>> MAX() to guard against unsynchronized times on source/destination.
>> If I added time difference between out-of-sync hosts, the delta would be
>> huge and the destination guest would behave really weird until this time
>> difference is over. Guests can cope with accidental timebase jump of
>> seconds but not minutes.
>
> Exactly. That's MAX(), right?
Sorry, I am not following you here :(
MAX gives bigger number than MIN and bigger number is what I want to avoid.
I even tried both, MIN works, MAX does not.
--
Alexey