How to handle regressions

mmeeks · October 11, 2014, 5:26pm

Hi there,

Setting up an entire automated build/test environment is simply way
beyond my capability. The best I can do is download installable builds
and simply test them by using them.

Sounds good that's why we have daily (and sometimes hourly)
snapshots of master - our latest development version available: to make
it easy for people to chase the latest features & help developers with
QA as they implement features.

Meaning - a developer should be responsible for the code they write,
including fixing bugs when they are found *without* resorting to
"'patches welcome' or 'pay someone to fix it'"

So the "when they are found" bit is of course the key =) No-one intends
to introduce new bugs (well, ok some people argue that a new feature is
a bug but this is fairly rare). They happen as an accidental side-effect
of feature development, and/or fixing other bugs - which can often be
entirely un-related and somewhere completely different.

However the cost of a fix - in developer time goes dramatically up the
longer that it takes for the bug report to come in. I know some QA /
development guys who work in a quite tight loop together as a new
feature lands and is polished: that is -by-far- the most effective way
to provide feedback. A bug a year later far more expensive to fix.

Failing that, we provide great tools to try to close the gap between:
"it broke" and "who is responsible / interested ?" - which is 'bibisect'
- which allows you to go back and run historic versions to find out at
what point (and almost 'by whom') the issue was introduced. Most
developers when you have bibisected, and CC'd them to point at the patch
feel responsible and jump in to fix the bug.

Of course - some reports (and I assume it's not you doing this) at this
point (if you're lucky - often this happens when they find a bug at
all get a sense of outrage & entitlement and start shouting at the
embarrassed developer, demanding their work is reverted, demanding
processes to stop XYZ committing ABC until they (personally) are happy
etc. this piece of the puzzle tends to have a predictably
counter-productive outcome =) It is worth working hard to not
(accidentally) look like that interaction.

If the vast majority of the developers don't agree with this principle,
and in fact believe that they should be able to just commit code for
something, then go on their merry way and/or respond with the "'patches
welcome' or 'pay someone to fix it'" responses

Given a generic bug reported loong after the development took place:
(ultimately) all bugs are caused by some developer either by action or
omission - I think that's a reasonable approach. The length of time it
takes to file it is -usually- a sign of its relative importance vs. the
other 6000+ open issues =)

Of course, if some reporter wants to help pin-point the regression to a
specific commit, and does a chunk of work to help interest a developer
in fixing it then that might work well too - that reduces the cost of a
fix to (hopefully) a simpler spare-time task.

From 10k feet though looking with an ecosystem perspective it is clear
that we have to stack the economics here to make things sustainable.
Expecting a (perhaps paid) developer to provide effectively indefinite
free support, to all users of every feature they ever implemented across
all other changes from others that may impact it is really not
realistic. If it was, we'd still have Sun around to fix the umpteen open
regressions vs. OpenOffice 2.0

I'm sure that's a familiar model to someone who sells services. If you
do a for-a-fee install a deployment of say 60x Windows version N
machines - as a one-off without an ongoing support contract. You are not
surprised to hear the customer phoning to endlessly complain about XYZ
change not working, and ... expecting that your time is free - and you
can 'just' help them with a given problem for free etc. [ perhaps
you don't ?] but the situation is reasonably analogous. Normally you
solve these problems by having a short period: "Report any issues inside
a month" - after which, they have to pay for support. Not a perfect
analaogy, but ... in this case no-one is paying at all it seems

Hope that helps,

ATB,

Michael.

mmeeks · October 11, 2014, 6:17pm

Hi Paul,

I apologize for not tackling your mail - apparently you hid most of it
after a sign off, and I didn't notice it =)

Please also understand that I don't use the feature in question, and
until now didn't even know it was broken

This is (perhaps) a good heuristic for determining its burning (or not)
importance vs. the other 500 or so open regression bugs (?).

And in the general case, ignoring enhancement requests due to lack of
developer time sounds reasonable, but then what exactly are the
developers doing with their time right now ?

So - developers do exactly what they want to. It is possible to try to
persuade them to do XYZ by winsome argument, challenging them in various
ways, paying them money etc. =) I do all of these from time to time.

Are there so many bugs that they are only fixing bugs ?

There are enough open bugs to consume around 150 man years of straight
full-time developer time, and (as we know) they are the tip of the
iceberg - there are way more filed in the Apache bugzilla from the
legacy code. Also - in fixing them, I'd expect us to create another 50
or so man years of work - so lets see it as a round 200 man -year-
problem, and you're good. That's my estimate at least - it takes around
5 days on average to fix a bug (yes some are quick, but some are really
not) - so do the math.

Some people look at that and say: "we should just spend 4-5 years only
fixing bugs" and then try to back that up with some co-ercive: "we
should lock down git and stop any commits that are not bug fixes" to try
to enforce that Personally I think that's only a winning strategy if
you want to loose all our volunteer (and commercial) developers that
actually do the fixing: ie. it would be completely self-defeating.

So this is not some either/or - it is a hybrid, we try to fix the most
urgent bugs, and also improve the code quality by re-factoring to reduce
the quality impact of future changes, and we write unit tests to try to
stop bugs getting in and we also work on features.

Either all attempts to reduce the management of an 8 million line
code-base down to a simple "this or that" type dichotomy seem pretty
doomed to failure to me =) its about trying to encourage a sensible
balance.

In that case are no further major revisions
expected any time soon? I'm assuming major revisions are still planned
for the near future, so am assuming that features are being added

We have a time-based release schedule - which is de-coupled from any
features. If it was a sensible thing to do (and it is not) we could have
an entire release with no new features in it just bug-fixes.

JFWIW - to get a sense of bug fixes vs. features - just read and crunch
the release notes of a summer vs. a winter release: one has a lot more
features in it because of GSOC - we spend a -lot- of our time bug
fixing.

but where do those features come from if not enhancement requests ?

Haha =) they come from what developers want to implement: ie. those
doing the work get to choose what to do. Some developers read
enhancement requests, most of us have a general feeling of the big
problems in the code we want to nail: some of which are really many
months of work. Of course, features are often paid for by customers of
the various consultancies working around LibreOffice - so then they get
implemented to that customers' time-line [ modulo the existing
time-based release schedule ].

Unless there are both feature and enhancement requests, and if so,
please explain exactly what the difference is, and why only one of
those is considered important right now.

I'd see feature & enhancements as synonyms.

And given the prices quoted for features and bugfixes, I would say
that only the super rich can afford that sort of thing. The rest
of us, if we're not developers, will have to wait.

I don't really see a magic new way to fund eg. the 200 man years of bug
fixing we need (or some think we need assuming we want a zero open bug
count). Economics will ultimately win: there is no money fairy - so we
need to find some way of setting up the economics so it is sustainable;
the rest will fall into place.

It is trying to market itself as if it isn't, as
if it is a serious alternative. And that means it has users. And
keeping users means at least a little pandering.

Its quite probable that we can improve our marketing (as we can improve
everything).

But for most commercial projects, if the users start complaining about
serious bugs, they sure *do* jump to it. They don't put it in a testing
branch and do nothing with it, hoping the users will notice and test it
for them. They fix it, test it, and roll it out to their users.

Last I looked TDF recommended getting commercial support from a
certified provider if you're doing a commercial project. It is
unrealistic to expect volunteers to fix bugs for commercial providers on
their time-line. It is also un-realistic to expect there to be no bugs
in the software - particularly given the legacy we have. It is also
un-realistic to expect the fixes we do not to introduce regressions. It
is also un-realistic to expect none of those to 'escape'. So - perhaps
there is a new approach - but whatever it is it really needs to add-up
economically and answer the question: how can we fund -way- more
developers to work full time on improving LibreOffice. I find all
answers to that interesting, so please do share.

Even if those users are whiney, demanding want-it-all-for-free's,
they're the only users you've got.

So - some users are a pleasure to deal with; do not have an entitlement
attitude, and are happy when/if they get a fix for their issue. Of
course, some users are as you describe, and I'm not sure it's a
terrible shame to loose them.

One doesn't expect old things to be broken, one just expects some new
features to have been added and some bugs to have been fixed. And new
bugs aren't exactly in the changelog

Fixing a given bug, is quite likely to introduce a new bug, clearly the
rate is sub 100% but ... that's life in development. We have a vast and
highly-coupled code-base.

And if one discovers a bug, well, one participates in
the bug reporting process, and hopes it gets fixed fairly soon.

Great.

And that's all fine, so far. It's when things aren't fixed, and one
starts pointing it out, and gets blamed for not being part of the
process, and told to either pay for it, fix it yourself, or basically
keep quiet and wait until somebody decides to get round to it, one
starts to get a little peeved.

So - I can understand the frustration; but what alternative is there ?
=) there is no giant pot of money to pay to fix every known bug. 200 man
years is around $4m if you hire cheap Asian developers (who I suspect
will take longer to fix each bug so say it's some $10m problem.

Especially when said bug is a major part
of the application. A real show-stopper.

We fix a ton of these before release time and during the lifetime of
the application. There are really very few show-stoppers that I've seen.
Particularly, if the issue is not found for months after release - I'm
personally skeptical that it is indeed a show-stopper: though try
explaining that to a typically angry person that found it and is
convinced it is the worst bug in the world =)

So most of the time there is no problem here, but in this particular,
and unusual case, the standoffish nature of the LO defenders who
refuse to admit any culpability is not helpful.

So - bear in mind that the vast majority if not all developers that
contribute code fix -way- more bugs than they create as an accidental
side-effect.

The user doesn't have a problem here, at least, not for longer than it
takes to uninstall LO and install MSO.

it on its merits, not on its price. Ultimately, if people can easily
just install MSO then why not: they can use LibreOffice for those things
that its better for and MSO for what its better for - why are we
worrying about them ?

The user isn't even complaining that LO made a mistake, the user is
complaining that nothing is being done about it, because as far as the
user was aware that was the case. For a major bug that was introduced
into a working feature. And wasn't fixed for more than one major
release. Which is a fairly bad state of affairs. And had LO said "oops,
our bad", and promised to do something about this

So - LO doesn't really have a corporate personality. Developers cannot
and do not read all bugs. If the bug is very serious, we expect QA to
tag it as a 'Most Annoying Bug' and then there is a better chance a
developer might look at and fix it.

But refusing to admit even the slightest blame here, and instead trying
to make out that this whole thing was the user's fault for expecting
anything

Well - it's clearly our fault if the users expect professional support
for free from LibreOffice - we screwed our marketing up =) They -may-
get professional support for free: there are a lot of professionals who
are awesome hackers who love to fix bugs for people just from the
goodness of their hearts, some professional LibreOffice developers do
this stuff in their work time (perhaps a customer filed the same thing),
and some do fixes in their spare time - but it is very far from
guaranteed.

I tend to agree that we need to make this more clear in our bug filing
flow. Setting a clear level of expectation would be very useful to avoid
disappointment. When we have our own bugzilla we can do that.

I see people who say things like: "I filed SIX MONTHS ago, and it's
still not fixed" (yes they shout - which again is a mismatch in
expectation about timelines and prioritization etc.

> So leave them with a security vulnerability - Good job IT guy

Well, partly that's their choice. They could pay for a bugfix, or live
with a show-stopper, or live with a security vulnerability.

Sure why not.

Although, at the quoted prices, I can't blame them for not wanting
to pay for this to get fixed. Especially when there is a lot of
justification to the argument that if LO broke it, LO should fix it.

So - I know nothing about other people's prices - but at Collabora we
have this crazy approach of charging the average of what it costs to fix
a bug, plus a normal consulting margin We find that charging less
than what it costs to fix is not a sustainable approach - but we're also
happy to do work under time & material terms if people think their bugs
are easy (sometimes they are in fact).

I suppose that LO could argue that they have no responsability to keep
the application in a working state

Seriously; LO is in a 'working state' I use it for work every day,
intensively. We are talking here mostly about corner cases - and ones
that happen not to have broken & been fixed in the era where we write
unit tests for fixes (ie. in the LibreOffice era).

> You can not force a volunteer

No, but you can revoke their commit access.

Same thing; you loose the volunteer. We work very hard to try to
attract volunteer developers. I hear you are a developer - would you
like to have a go ! there are plenty of Easy Hacks to get stuck into
and plenty of easier bugs to fix.

If a dev is going to go around breaking things, and then refusing
to fix them, he is going to cause more harm to the project than good,
and shouldn't be allowed to play in the sandpit.

So there are private social mechanisms that have this effect in the
developer community. But you're assuming a lot here. The first huge
chunk of work is finding out -which- developer broke something. That is
something a volunteer can do with bibisect reasonably easily; and
they're welcome to do.

In this case that is abviously too drastic a measure, but the point
still stands that the developer who broke it does bear some
responsibility to fix it.

And the developer who broke it no doubt does tend to feel that; can you
imagine the effect some of the overkill in the bug has on how they
feel ? They spent all this time implementing a new feature that was
loved by millions, and then some swine flames their ass endlessly about
some minor bug and says they should have their commit access revoked
[ not saying that happened in this case, but it is an -all-too-common-
and sad pattern in the bad bugzilla interactions I get to see ].

And LO needs to admit that it was a rather big blunder. Those happen,
and no-one should be trying to hang LO for it. In fact, LO being able
to admit it would make us all feel a little more sympathetic to LO.

We make blunders all the time =) none of them are intended with malice
to hurt our users. Indeed, we're eager to help our users help us to make
the product better: there should be a smooth ramp for getting involved
with QA, and -nothing- is hidden: all our development is in the open,
that a bug got introduced, and (after months of release candidates)
escaped into the wild is a point of shame for us all. *But* - it is also
a point of shame for those who failed to show up to test the RC's or
master snap-shots with their pet use-case IMNSHO And it is no issue
at all to point that out. There is not some magic pool of "other people"
who are to blame for a bug escaping =) its particularly ironic top blame
the very people who are trying to make things better in this regard.

but so far no one seems to want to admit even this most obvious of
faults on LO's part. Which means the response as a whole comes off
as trying to shirk all guilt.

So - I think my point here is that we're all guilty. LibreOffice relies
on volunteers.

What I read from this thread (and was encouraged by FWIW) is that
Tanstaafl seems less upset about the issue than those trying to defend
his corner, and may well try to help-out more testing his customer's
use-case in more recent builds and so on. If that continues to happen -
we have a net win: one more much appreciated QA member.

We will always have bugs; we will always have disagreements. The only
interesting thing is what we do with that.

I'd like to see progress on getting more people doing QA earlier in the
cycle. That is where it is most valuable; ideally working alongside
developers in a friendly & constructive way -really- early in the
process. Sure that means reading the git commit log, playing with master
or RC builds and sending some e-mail - but I think it's worth it.
Everyone else benefits from that - and the load of doing the work to
improve LibreOffice already -heavily- weighs on the few that are willing
to make positive change in the code: anything people can do to get
involved sharing that burden is very much appreciated.

Any my economic questions are not just for amusement: I think there is
another vital task here, which is to try to get more funding into the
ecosystem to invest it into LibreOffice in a sustainable way: I spend a
lot of time thinking about & working on that; help appreciated.

ATB,

Michael.

mmeeks · October 11, 2014, 6:30pm

Hi Tanstaafl,

> You call the bug in question a major regression, and forget that the

people providing quality assurance are indeed volunteers. They either
catch a regression, or they don't.

Yes, but ...

a) surely you aren't denying the fact that many - most? - of the
Libreoffice *developers* - especially ones working on core functionality
- are actually *paid* coders, are you?

Sure - some small but growing fraction of individuals working on
LibreOffice are paid; that small fraction produce around 2/3rds of total
commits. They are paid BTW not to fix random users' bugs - but to
support the customers of RedHat, Collabora, SUSE, Canonical, CloudOn,
Igalia, etc.

I suspect there is some mis-communication here that there is some
money-fairy around TDF that pays people full-time to fix random
end-users' issues =) Particularly end-users with customers paying them
to provide fixes / support for their issues - when they are depending on
getting that out of 'the community'.

<sigh> Inability to cut/copy/paste from/into fields is a *major*
regression - for anyone who uses them.

Sure - but, hey - it works for me; I have here a writer document
(loaded from a .doc) with fields in it and I copy / paste into it and
-it-just-works- (I have a 4.2 based build). So - I'm not sure that this
bug is quite as debilitating as suggested; there are several different
types of fields etc.

Do you not see this? Is this not so obvious as to almost knock your head
off?

Developers do test their features. We ask and encourage our users to do
the same, and to file bugs.

A month later the OP asked if the 4.3 series would be patched or if we
would have to wait until 4.4, and he was rudely (imnsho) told by Joel to
'feel free to submit a patch', which happens far too often.

So - of course, QA are welcome to prioritize the bug as they see fit.
And whomever implemented the fix is welcome to submit it to whatever
versions they think it is important to go to. Sometimes we get that
wrong - and of course people can indeed jump in and help to get their
patch merged to another branch if they want it there.

No one said they were (or this one was) - but it is also plainly evident
that the developer who pushed this code into production didn't even do
minimal testing.

I havn't read the bug; but there are a truck-load of assumptions behind
what you say that are way too numerous to even address here. Of course
developers test their changes, but there are lots of things that can
subsequently change underneath them that void assumptions that have been
made elsewhere.

Incidentally - what I -least- like about this is the commercial side of
it; its fair enough to make a mistake in the priority of a bug, or
forget to back-port some fix to somewhere. But there is a -serious-
moral hazard problem if we start to have lots of aggressive arm-twisting
on the users lists around specific bugs for other people's customers
that I'd like to avoid.

Anyhow - I hope that helps; it looks like for better or worse we got
the underlying issue resolved.

ATB,

Michael.