TL;DR: Let's stay with UTF-8.
Longer version:
I had a (not so) quick look at the code and the amount of effort for
switching our char representation seems unreasonably high.
If we kept our current 8-bit representation, the main "issue" from a user's
point of view might be with indexing: A user might suspect that a char
vector with N characters would always have N elements and indexing the n-th
element would return the n-th character.
But even if we moved from a 8-bit representation of characters to a 16-bit
representation, we wouldn't be able to represent characters from higher
Unicode plains with one char element. Even if we went one step further and
used a 32-bit representation, there are character modifiers (e.g. accents).
So one character could always be represented by several basic elements
(8-bit, 16-bit, or 32-bit).
Thus, indexing into character arrays will always be problematic in some
cases. No matter which UTF-flavour we would be using.
I am seconding Rik's and Michael's reasoning and would like to vote for
staying with 8-bit chars.