Kickoff meeting for the review was on 4/3/08. Charles Newsome will collect documents so sent things to him. All the documents are in My Documents\cms\PLC_review.
Strip tracker and pixel safety systems exist. The pixel system is based on th strip tracker system. The pixel code is written in SCL, the strip code in STL. The strip code has more functionality.
What should we focus on? Parts of code?
Testing and parameters - what could go wrong? what cases are missing? What error conditions are missing?
Want us to look at: code, data structures, and cases handled.
Someone would like us to point out the differences between Pixel and Strip code, because the issues raised with regards to the pixel code are likely to apply to the strip code. They want uniformity (at least some of them).
Seems like really two separate reviews are needed:
does the code do what it is supposed to?
are the parameters valid for operation (of this run)
Want data structures values to be changed - not code when configuration happens.
There is no interface to XDAQ.
No need to talk to XDAQ from strips
Pixel needs small interaction with XDAQ for power supply ramping.
PVSS is used now.
pixel code has GUI.
want generation scripts to some degree.
want configuration to come from Oracle.
Claim to be 99% feature-complete (functionality).
DCS - user interface config?
Tested in PVS simulator, but sounds like the tests are not very thorough. The tools sound like they make testing limited.
Discussed the validity of a set of parameters for a particular way to operate. They seem to want a combination of pixel and strip tracker code/configuration.
details from presentation
37-30 Siemans modules, using SCL, See p12 of slides
DB = Data Block
100ms scan rate for sensors? or is this the heatbeat call rate?
>> This is the OB 35 (heart-beat) call rate.
GUI is based on PVSS tools
PVSS: Provides FSM to overall operation of CMS detector
Validation of configuration performed by SCL PLC functions
OB1 (main) and OB35 (timer interrupt routine) are important, OB100 is startup
Heartbeat check in PVSS.
See P22 for the main objective.
Looks like functions check state of the system. (p18)
>> Yes, there are functions which check the state of the PLC system as a whole. >> Note:They are actually presented on p.16, not p.18.
A long series of functions exists that appear to be called from "main". Things like "procGlobalResetRequest" exist.
excluded from review
57-driver (interface to PVSS) not part of this review.
FSM code in PVSS is not part of the review.
This section is largely incomplete due to lack of time to study the overall structure and more of the code. To properly evaluate the overall structure would take more time.
documentation and organization
The Word document and Power Point slides helped me to understand the organization and the overall purpose of the code. Most useful in the text in front of each of the data structure sections. The symbol table sent to me later also helped out.
The pdf files of the individual function was less than desirable.
Establishing the numbering scheme helped organization. It made it easier to locate and know the purpose of code, structures, and data blocks.
I did not get a chance to evaluate the data structures with similarly named things like alarm state to see if more of the alarm logic could be pulled out into shared functions instead of being embedded in the specific channel functions.
Attempting to be consistent across functions is good. I was able to quickly understand sets of function because the logic and names looked similar.
One thing I did have trouble with was understanding how data structure access (modifications and reading) should be restricted from one set of functions to another. Here I am thinking of private access versus public (conceptual in the case of SCL) and contracts between functions (what is a function allowed to modify or interrogate?). Some examples can be seen in the comments in specific function in the other pdf file I've provided:
CRC calculations in the channel reading functions
multiple flags or value that need to be checked to use values not owned by the function checking them
use of IN_OUT parameters to write directly into the data blocks
Another issue is the use of many flags to indicate state. This use makes it difficult to look through the code and know which combinations are illegal, invalid, cannot be attained, and which are valid.
I noticed that many if-then-else statements are used to determine conditions. In several cases, I noticed that a simple assignment of a boolean expression to a variable would suffice. This sort of thing will likely reduce code size and increase execution speed.
Questions from the kickoff
Does this system only read data from sensors (see p12 of talk - dataflow)? No - it directly sets the relays.
Does it set values on its own or rely of commands from above? It sets the values.
Does this system make only decisions or just carry out orders? It makes decisions.
Is the response time 100ms for detecting an issue?
>> 100ms is the rate by which OB 35 (heart-beat generation) gets called. >> The time for detecting (and reacting to) an error conditions is <= twice the OB 1 cycle time ~ 1s right now. >> The Pixel DCS have ordered a more performant PLC (CPU319) model to replace the current (CPU315) one. >> This upgrade will reduce the time for detecting (and reacting to) an error condition to ~100ms >> The PLC to PVSS IO rate is much faster. >> (The new PLC model is capable of processing 10 times more instructions per unit of time than the current model).
Can the PLC know that a device is off after it attempts to shut it off?
>> No, the PLC has no knowledge about the state of the CAEN Power Supplies. >> This information is thus only available in PVSS.
How does it know if it can detect this?
>> We mainly the PLC simulator (PLCSim) included in the Siemens Step7 software up to now.
Do you use a test stand for testing out the code and reactions or do you just use the simulator?
How is this code maintained? SVN? CVS? other system? sounds like nowhere but the filesystem right now
>> We have begun using CVS for managing the development of the Pixel PLC code. >> Up to now, the main issue for setting-up the CVS repository is that the Siemens Step7 software >> stores some information in binary format. >> We need to find a procedure to export this information in ASCII format, >> in order to be able to fully track changes using CVS.
Is there a system test that insures that everything is working?
Are you protected (or should you be protected) against invalid command sequences? Is there such a thing?
Is there such a thing as communications timeouts?
How are the Temp and Humid sensors calibrated? What accuracy is needed? How do you know you are getting good readings besides checking the overall range?
>> The temperature sensors are calibrated with a precision of better than 0.5 degrees Celsius, >> a precision which is sufficient for our purposes. >> The main issue about the "calibration" parameters for temperature sensors is that >> a. Siemens provides the temperature in units of centi-degrees, not degrees, so we need to devide by a factor of 100 >> b. due to space limitations in the inner region of the CMS detector, >> we have to use uncompensated 2-wire read-out over a cable length of about 10m. >> The cable resistance introduces a <2 degrees shift in the temperature values, >> that we correct for with the calibration parameters as well. >> >> Lalith Perera calibrated the HMX humidity sensors. >> The precision of the dew point calibration is about 2C. >> >> The connections of sensors to channels of the PLC system is monitored >> via the PLC Diagnostic functions (OB 8x and SFC 51, SFC 59).
Is the communications reliable? I could not find enough information to know if all the CRC calculations are needed or what guarantees the communication subsystem gives you.
>> Piero Giorgio Verdini and others are actually concerned about the reliability of the S7 driver communication and will be further investigated over time.
The use of OK/ENO and checking data integrity: are these things used or should they be? Looks like OK can be used to check the condition code (status register) that an operation has been carried out successfully.
Looks like there are many data configuration parameter validity checks e.g. in FC402 there is a check that if a channel has been correctly configured. Why is this necessary? Is the system allowed to operate with channels that are wrongly configured? or is this a check if the channel is valid?
>> FC 402 actually checks both: >> first that the configuration parameters uploaded by PVSS are valid, >> second that the Siemens Diagnostic functions do not indicate an error >> for the channel processed by FC 402.
testing and configuration
Is there a way to run through all possible combinations of input automatically and record all outputs? This might be a way to validate that the code is making correct decisions.
Is there a way to read out the current version of the configuration from the PLC? Are downloaded configurations labeled and versioned? Should the code be restricted at compile time to a particular set of configuration versions and labels?
>> Yes, this is what the read-back settings are intended for. >> The read-back settings will be archived in the PVSS conditions database (Oracle). >> >> The configuration parameters downloaded by PVSS into the PLC are foreseen indeed to be labeled and versioned >> (in the PVSS configuration database; based on Oracle) >> This is not implemented yet, but will be implemented in the near future. >> >> The configuration parameters are foreseen to be changed **during operation** of the CMS Pixel detector.
Can the development environment/system produce a set of constraints that can be used for building valid configuration parameter sets? Information such as number of channels available and limits?
>> The "constraints" (upper and lower limits which define a validity range) for configuration parameters are defined in DB 2. >> The configuration parameters uploaded by PVSS are checked for compliance with these limits before they get used by the PLC >> (the checks are implemented in FC 1xx).
Whats the difference between block calls and function calls? Which is used? Looks like FC type is used and is recommended in the appendix of the book.
questions and comments about the Word document
Are you really worried about accidental overwrites? In what ways can this occur? In what way does this mechanism achieve the protection is claims? Why is it necessary at this low level? Does the higher-level system provide ACLs, integrity checking, and confirmation?
>> We are not really worried. The upgraded PLC will give us the opportunity for additional >> error checking if it is needed in the future. >> "Accidental" refers to "in case of a bug in the PLC code". >> The aim of separating the configuration parameters, operator commands and process values >> into separate DataBlocks is make the PLC code more robust in case of such a bug. >> >> PVSS (as the higher-level system) does not perform checks on the configuration parameters.
My concern is that someone will be maintaining limit rules in two places: where the values are entered and stored in the database, and here in the PLC. I'm not sure what the best breakdown is for this. At some level the configuration building application needs to know the rules and available hardware. At some level the PLC hardware should be sure that it is getting something good.
final interlock decision - What external system state causes a trip?
>> The external systems are: >> o Cooling Plant >> o Dry Gas >> o Beam radiation monitor >> >> Checks which verify that the external systems are operational >> are performed by the "Tracker Master PLC". >> The Tracker Master PLC will send a hardwired "kill" signal to the Pixel PLC in case >> the external systems are not operational, which in turn will interlock the CAEN power supplies.
Of the set of functions that evaluate and set relays, 308 looks difference than the others. From the description, it looks like it does not fit in this category.
>> FC 308 actually looks pretty similar to FC 301 to me... >> ...functionally they are very similar: FC 301 evaluates interlock decisions based on "analog" information (temperature and humidity sensors); >> FC 308 evaluates interlock decisions based on "digital" information (digital input channels).
FC 1240 - how does this work out? Is there any issues associated with doing this action?
>> This function has been removed from the Pixel PLC code. >> We will use time servers for the synchronization of PLC time and PVSS time in the future >> (remark: the time servers are still to be setup by Frank Glege).
Is the testing shown here carried out by hand? Is there a way to automate this sort of testing?
>> Yes, right now the testing is carried out "by hand", >> using the PLC simulator (PLCSim) tool provided by Siemens. >> >> We simply have not found an easy way to automate the testing. >> This is a common issue for PLC systems of many CMS subdetectors and needs to be discussed in more >> detail at a later time together with the CMS Silicon Strip Tracker PLC system we believe.
Most of the comments are directly in the Comment Summary pdf file. I did not get a chance to look through FC 301, 308, 1207, and 304. I do want to look through them.
This appear to be a function invoked by the watchdog interrupt. Why is it called every 100ms when you need a heartbeat signal every 1.6 seconds? Why not program it for a call every 1.6 seconds?
>> Both the 100ms and the 1.6s are the result of an initial choice made long ago. It not considered a final value at the time. >> In order to extend the lifetime of the Relays, OB 35 is actually now called every 10s and generates a heart-beat with a period of 2 minutes >> (1 minute Relay open, followed by 1 minute Relay closed). >> The reason why OB 35 is called every 10 seconds rather then every 1min is that in addition to generating the heart-beat signal OB 35 performs >> additional tasks as well for which a 1 minute period would be too long.
I'm trying to learn how configuration works. I've found the following statement:
Watchdog interrupt handling is set in the Hardware Configuration data when the CPU is parameterized.
Where in the CPU parameterization data? Can we see it? How to you "parameterize the CPU"? I am confused as to how OB35 data gets filled in.
>> The CPU parametrization data is defined using a tool provided by Siemens (called "HW-Config"). >> HW-Config allows to set parameters of the PLC system by means of rather intuitive graphical user interfaces. >> For example, using HW-Config we have configured OB 35 to be executed every 10 seconds.
Looks like FC104 does some of the work. DB604 is filled by PVSS driver. FC104 looks like it transfers the data from DB604 to DB104.
>> Yes, exactly. This is what its task is.
(I received a response from Christian regarding this issue, but have not yet looked further into it.)
Why is the result of the BYTE_TO_INT not checked (OK==true)? Should it be?
>> The byte to int conversion cannot fail. >> BYTE_TO_INT just takes the 8 bits which represent the byte, adds 8 bits worth of zeros >> and interprets the resulting bit pattern as a 16 bit integer.
If the data used by the BYTE_TO_INT function is really just configuration data, why is it converted on each invocation of this function as it appears to be? why not once at configuration setting time?
>> This would be possible; it would increase the memory consumption, however. >> (Actually, one would probably use integer rather than byte types already in DB 602 in that case.)
Does the delay parameter exist to prevent the channel from going in and out of alarm too rapidly? If not, what is it used for?
>> The delay parameter is supposed to avoid that the interlocks trigger due to intermittant "spikes" >> which might be caused by wiggling cables/connectors or bad connections. >> (We have indeed seen such "spikes" during commissioning at the Tracker Integration Facility at CERN; >> remark: the delay feature has been introduced by the CMS Silicon Strip Tracker PLC code.)