The idea that runs should only be compared relative to the previous output (0z to 0z, 12z to 12z) due to infinitesimally small (magnitudes of 10th/%) idiosyncrasies in data, is laughable. Even if there were data blind spots, you either run algorithms to blend and normalise it, or you backfill with prior data. The overriding error correction - as you quite rightly state - is the updated observational data, which is the precursor to every initialisation.
Therein, to discount any run even though it contains perhaps 98% of all operational data, is utter nonsense. I could sympathise with such a view if such data blind spots brought the scope down to <85-90%, but that simply isn't the case. I often find intra-run variance to be, in the main, anecdotal; for example, verification again the GFS suite (0z, 6z, 12z, 18z) doesn't actually tend to favour any one particular initialisation - they all have, more or less, periods of better performance over each other - which, to be fair, is exactly what you'd expect from a stochastic model.
I think the next big step for NWP, will be incorporating feedback cycles into the algorithms as - at present - my understanding is that all initialisations are run in isolation from one another. I think it's very difficult for conventional NWP to accurately model a long drain phenomenon (like SSW) if, for instance, it is unaware of a temperature growth pattern in the stratosphere. I think we're many years away from feedback-cycling though, as that really would ramp the error rate up!
Back to current assessments...
I see no reason why members shouldn't be optimistic, of the current outputs. If you take a wider, more encompassing view (5-7days or so) what we have seen, is an underlying trend both towards cold (as opposed to milder/zonal) - in the first instance, but also a growing and generally consistent NWP consensus towards amplifying blocking strength to our NE. These are factors very much in our favour, and it is this wider window which I personally tend to view NWP in (not-so-much bothered about the intra-run variance)
Background signals are teleconnectively conducive to, not only maintaining this relative consistency, but further building strength into what is an emerging cross-model pattern. I think this context is vitally important when, for instance, you might see the GFS resort to its default zonal modelling. There obviously remains a large degree of uncertainty, but I think this week is when we will come to identify a growing momentum for the blocking signal going forward to be favourable across NWP outputs.
I think we may be reaching a tipping point here. Exeter had little confidence in the 'extreme' UKMO 12z UKMO output as it had little support, therefore heavily modified. Okay, that's fair enough. However, the 0z is almost a replication, in broad terms anyway. Moreover, tenuous signs of other NWP beginning to side towards the UKMO would - to me anyway - be a sign that Exeter will be taking a different view of the 'extreme' UKMO output. If we see another 'extreme' 12z later this afternoon, then there's respectable consistency behind that output, and you'd really have to lean towards it.
Experience would suggest a moderation in the output, but let's see later...
SB