Never Change a Running System

Introduction

While the service life of office equipment is limited to three or four years at the most, plants, systems and devices used in factory environments are expected to carry on running for a minimum of ten years, and in most cases for 15, 20 years or even longer. But where do these systems stand on the issue of security once their operating systems are no longer supported by the manufacturer and when updates and patches are no longer available?

Users were given a foretaste of the issues surrounding software longevity with the emergence of the much-anticipated Y2K or millenium bug, now past history. Programmers working back in the 1970s would never have imagined that their Cobol programs would still be in productive use right up to the dawn of the new millennium and that they would be recalled from their well-deserved retirement to patch the software they had launched all those years previously. And all because, in a bid to save storage capacity, two digits instead of four were used to denote the year number. Everything sailed along smoothly until the date was due to change from 1999 to 2000. If no action had been taken, after this key date change every computer would have interpreted the two zeros as both 1900 and 2000, causing mayhem with incorrectly sorted numbers, invalid data records, faulty differential calculations and incorrect real time. The Massachusetts Institute of Technology calculated that the cost of measures put in place to pre-empt the millennium bug ran to 500 million Dollars for the American Medicare program alone.

PC-systems conquer control rooms
With the introduction of PC-based control systems innovation cycles became shorter

But the Y2K problem impacted on the factory environment too: Many production plants were running obsolete systems that suddenly had to be replaced altogether due to the discontinuation of any further software or hardware support by the original manufacturer. This is what happened to one of the big car manufacturers which was successfully running production control centers on a PC-based Unix system with Intel 486 processors in its five press shops – all without a hitch.

But Interactive Unix, an ultra-modern, state-of-the-art operating system when it was commissioned twenty years before, had long since been discontinued. The producing company had been sold and its successors had stopped maintaining the software. To make matters worse, the Intel 486 PC used was suspected of harboring a Y2K problem.
Changing over to a completely new system has drastic consequences for event processing, database queries and process images. These all have to be reconfigured, tested and commissioned – as far as possible without impinging on running production. One possible evasive tactic in this case was to leave the applications unchanged in the process control center and to adjust communication to the automation and field level using uniform standard software. Following testing performed by the system integrator, the changeover for productive operation took place over a weekend.


The Y2K problem showed how long Software systems are in productive use
January 1st 2000 is not the only date which embarassed IT specialists

Anyone who fondly imagines the Y2K problem to have been a once-in-a-lifetime aberration should take a closer look at what happened when the year changed from 2009 to 2010. January 1st 2010 saw problems arising with the date change on credit cards, booking systems, antivirus software and spam filters. Thankfully the next technical glitch along these lines is not anticipated before January 19th, 2038, when the 32-bit integer for system time in Unix systems will have reached its maximum value, causing the system time to reset to its initial value of January 1, 1970. This will affect not only banks and insurance companies, but also a whole host of small systems such as routers, electronic measuring devices, intelligent sensors and any others which rely on Unix derivatives.

Far-reaching consequences of software and hardware decisions

These examples clearly illustrate the strategic implications of a decision to opt for one type of software or another. Once a decision is taken and a particular system introduced, a company will have to live with the programs and remain at the mercy of the provider’s release planning for years to come. The effort and expense involved in maintaining and operating installed hardware and software and the different communication components rise with each passing year of use. While process control systems installed on workstations were once expected to run for up to 20 years, this does not apply to PC-based standard components. Once the running repair and maintenance costs begin to rise disproportionately or competitiveness is compromised by the lack of any further scope for upgrading, systems are pronounced ripe for modernization. As in the case of the car manufacturer mentioned above, the plant software is a depository for a company’s entire pool of expertise, encompassing all the technological regulations, proven control algorithms and plant operator displays. The sheer complexity of this type of porting or migration is accurately illustrated by the quantity structure of a control system used in the Ruhleben sewage plant run by the Berlin Water Board, one of Germany’s biggest water utility and waste water treatment companies.

Sewage plant Berlin Ruhleben

It operates a 9,400 kilometer long sewage channel network with some 225,000 connecting pipes and 147 pumping stations, purifying over 220 million cubic meters of water every year. At the time of its modernization, the process control system in Ruhleben comprised ten operating stations encompassing more than twenty monitors in four control rooms. The sewage treatment plant was monitored and regulated by 45 automation systems.

The control system managed over 750 process displays and 650 graphs charting 1,800 drives and 2,700 measurement points. In total, the system was capable of issuing around 20,000 individual messages and processing more than 210,000 variables. The changeover was staggered over two implementation stages: First of all the software was adjusted to the new system on a tool supported basis. This step-by-step modernization process permitted the retention of system peripherals, sensors and input/output modules. Following their recommissioning, these systems continued to work with the same technological accuracy as before. Step two was to modernize the hardware, which entailed rewiring, exchanging modules, setting the distribution boxes and performing a system test. The changeover was performed alongside running operation without causing major disruption.

As this modernization process illustrates, plant owners should always opt for an IT infrastructure which allows flexible and simple integration into existing architectures – even if only for cost reasons. However, apart from a precious few exceptions, no uniform standard exists for connecting different production areas and levels. Machine and plant manufacturers tend to equip their products with the most suitable operating system for the job in hand. Each is fitted with its own communication interface for data exchange, which is not necessarily chosen with the focus on integration capability. This is perfectly in order where machine tools, printers, conveying devices or tanks and overhead reservoirs are remotely located or operate in isolation. However, where these components are installed as part of a networked production facility, what emerges is a highly complex and heterogeneous process landscape linked by a myriad of interfaces which compromise the clarity of the information flow. When the time comes to modernize the system, if not before, this gives rise to substantial additional expense as well as entailing considerable development and porting costs.

Symbiosis of openness, real-time capability and security

Openness versus security

Cost reductions can only be achieved by simplifying the complexity of the plant and the control or automation system, and also by embracing open standards. This is precisely the direction taken for many years now, also in industry, by internet technologies making use of TCP/IP and Ethernet. These offer the conditions necessary for standardization and simplification, as well as linking networks and infrastructure worlds which used to operate independently.

On the process level, the different field buses have asserted their position as the industry standard. To allow absolutely all automation devices, programmable logic controllers and sensors from different manufacturers to be linked within a common flexible network, Microsoft came up with a quasi-industrial standard in the form of OPC (Object Linking and Embedding for Process Control).

Today, the OPC Foundation, to which over 450 companies are now affiliated, is responsible for updating and disseminating the standard. Without OPC, two devices would require precise knowledge of each other’s communication facility in order to be able to communicate, making upgrading and exchange a far more difficult process.

With OPC, it is enough for each device to write an OPC-compliant driver just once. OPC is based primarily on the Microsoft DCOM specification, and permits communication across firewall or domain boundaries using an OPC tunnel. But even without an OPC tunnel, communication is possible across routers and firewalls. In this case, authentication takes place using a local user table. One drawback here is that an identical local user has to be present in both the server and the client to take care of OPC or DCOM communication. Another is that the password must be identical. When working with a large-scale installation or production plant which requires integration of a large number of different devices, systems and components using OPC, individual assignment of a user to each device using an individual password is highly impracticable, not to say impossible, as real time capability excludes the use of complex security measures and also data encryption. To achieve adequate performance, this means that the systems have to be integrated in a way which allows them to accept external data within the network on trust. Every virus scan or security check, every routine to authorize and authenticate a telegram makes for a dramatic reduction in plant performance, possibly also resulting in data loss due to run time errors and considerable impairment to production.

Performance versus security

But in the mid-nineties when OPC was becoming established with Windows 95, scarcely anyone gave a thought to the possibility that viruses and trojans could infiltrate the factory environment. At that time, factory networks were still strictly segregated from office networks, and only a handful of operators involved with servicing and maintenance had access to the control systems. This type of closed-network communication without any connection to either the internet or the company network offered adequate protection. The situation in today’s industrial environment has undergone a fundamental shift towards greater flexibility, openness and mobility. Today, network links are the rule rather than the exception. A specification has now been developed in the form of OPC UA (Unified Architecture) which is set to replace all previous specifications, eliminating platform and DCOM dependence. The specification’s key elements were elaborated at the beginning of 2009 with a view to their eventual adoption as the definitive IEC standard (IEC 62541). The fundamental principle is for machine data such as process values, measured values and parameters to be not only transported but also described in machine-readable form. Adequate protection is now provided by dedicated security implementation based on the latest standards.

Security is a task assigned to more than one paticipant


However, responsibility for security matters rests not solely with the manufacturer and system integrator. Owners and operators also have a duty to ensure that gateways are sufficiently robust, ports are closed and that multiple firewalls are in place to protect the industrial network and block access to control cabinets. Despite this, it is impossible to provide any wholesale assessment of the level of risk from attack. Although theoretically speaking it might be possible for would-be perpetrators to locate a security breach through which to manipulate or spy on machine controls, first of all they would have to gain access to the control systems.

Siemens spokesman Simon on the subject: “Exploiting security vulnerability to gain unauthorized access to industrial networks requires direct access to the Simatic S7. As long as systems are running under normal conditions in a secured environment, this is impossible for any would-be perpetrator from the outside.” To date, there is no record of any attacks through Profibus. This would be all but impossible to achieve, if only because of the limited number of users able to connect to this field bus and the absence of any IP or gateway facility. For Ethernet-based Profinet, Siemens offers security products which provide effective protection in the form of Simatic Net Scalance S and Simatic Net Softnet.