Friday, February 19, 2010

Toyota's woes: why simplistic solutions are dangerous

Automotive Designline has had a couple of opinion articles targeting the latest trials and tribulations of Toyota.  One is very fatalistic, and seems to call for removing all software in the car:

Much of the issue Toyota faced had to do with floor mats, which doesn't involve software, and broadening the "software-is-buggy" concern to the entire recall is alarmist. Even if the recall were wholly due to software, it is not realistic to remove software from today's automobile.  Cars are not going to be designed without software, period.  In fact, if anything, the opposite is happening: more and more software is going into cars.  Customers want piles of features for cheap, and automakers want the flexibility that comes with software solutions.  Innovation and driver safety is enabled by software: adaptive cruise control, digital instrument clusters, pedestrian and animal warning, hybrid powertrains, parking assist, and fuel saving ignition systems all require software to work.  Lightweight drive-by-wire saves cost and enables safety features like lane drift adjustment .  Anti-lock brakes, once a "scary" feature have now become common-place and standard issue.  And infotainment and navigation systems are continually growing in popularity and demand.  Becoming a luddite or a technophobe will not stop progress.

The other commentary is from a hardware engineer who is rather naive when it comes to software:

His effective point is that "hardware validation is successful, and hardware engineers can get it right, so move those methods to software."  Firstly, that does not reflect my reality, where hardware problems and corner cases are found all the time in embedded development.  Hardware bugs slip through the validation net for lots of reasons: inappropriate tests, unanticipated use, incomplete testing, improper design, and bad requirements to name a few.  Hardware bugs are far more expensive to fix, so it's usually software to the rescue.  Hardware appears "perfect" because there are almost always software workarounds helping the hardware along!

It is also almost insulting to think that applying hardware verification methods to software is the "fix" for automotive software quality control.  It belies a poor understanding of software development, and can only be because the author does not understand software processes, the complexity of software, or how software is created. There is already an entire body of knowledge around software development processes, quality management methods, and best practices.  Software methods are fundamentally differently than hardware methods because the two products are created and designed using orthogonal methodologies.  Software engineering disciplines may not be as clear cut as designing a circuit board or an IC, but they exist already!  Those methods are already well understood by automotive companies and suppliers.

If software keeps coming into the vehicle, and we need to make the vehicle safe, what can be done?  What's done in industries like aerospace is things like dual-system redundancy, triplicate voting systems, or exhaustive test beds.  Those things add tremendous cost and lengthen development.  Car customers like you and me are not governments or airlines with multi-million dollar budgets buying products that will be maintained for several decades.  No: customers are not only price sensitive, but time sensitive.  Fickle buyers want this year's features today, not in five years.

You can use software to help.  The earlier you find bugs, the better, and some systems are better at early identification of bugs than others.  Multiple-address space designs are than single-address space designs by forcing erroneous code to crash.  This is a good thing--it leads to far quicker isolation, identification and resolution of problems.  A microkernel design like the QNX Neutrino RTOS isolates every system process so that device drivers, software stacks, and applications are all treated with that same level of rigor.  Pro-active measures like high availability systems can ensure that software recovers gracefully from faults.  It is your insurance that the software will still operate properly, even in the face of undetected errors.  Time partitioning systems can be added to ensure that software is always getting the minimal required CPU cycles.  Time partitioning protects against unanticipated log-jams of system activity that would otherwise cause missed deadlines and system failures.

QNX has been helping people design bullet-proof software in use in safety-critical systems for 30 years now.  We know that it isn't easy.  And we know that it requires discipline.  But it doesn't require throwing out the baby with the bathwater.  Nor does it require some kind of silver bullet.  Just good practices and good software.  And good management to pay attention to engineers when they say something isn't yet ready to go.