CmsPixelPlcReview

Kickoff meeting for the review was on 4/3/08. Charles Newsome will collect documents so sent things to him. All the documents are in My Documents\cms\PLC_review.

Strip tracker and pixel safety systems exist. The pixel system is based on th strip tracker system. The pixel code is written in SCL, the strip code in STL. The strip code has more functionality.

Purpose

What should we focus on? Parts of code?

Someone would like us to point out the differences between Pixel and Strip code, because the issues raised with regards to the pixel code are likely to apply to the strip code. They want uniformity (at least some of them).

Seems like really two separate reviews are needed:

  1. does the code do what it is supposed to?

  2. are the parameters valid for operation (of this run)

facts

DCS - user interface config?

Tested in PVS simulator, but sounds like the tests are not very thorough. The tools sound like they make testing limited.

Discussed the validity of a set of parameters for a particular way to operate. They seem to want a combination of pixel and strip tracker code/configuration.

details from presentation

A long series of functions exists that appear to be called from "main". Things like "procGlobalResetRequest" exist.

excluded from review

overall comments

This section is largely incomplete due to lack of time to study the overall structure and more of the code. To properly evaluate the overall structure would take more time.

documentation and organization

The Word document and Power Point slides helped me to understand the organization and the overall purpose of the code. Most useful in the text in front of each of the data structure sections. The symbol table sent to me later also helped out.

The pdf files of the individual function was less than desirable.

data structures

Establishing the numbering scheme helped organization. It made it easier to locate and know the purpose of code, structures, and data blocks.

I did not get a chance to evaluate the data structures with similarly named things like alarm state to see if more of the alarm logic could be pulled out into shared functions instead of being embedded in the specific channel functions.

coding

Attempting to be consistent across functions is good. I was able to quickly understand sets of function because the logic and names looked similar.

One thing I did have trouble with was understanding how data structure access (modifications and reading) should be restricted from one set of functions to another. Here I am thinking of private access versus public (conceptual in the case of SCL) and contracts between functions (what is a function allowed to modify or interrogate?). Some examples can be seen in the comments in specific function in the other pdf file I've provided:

Another issue is the use of many flags to indicate state. This use makes it difficult to look through the code and know which combinations are illegal, invalid, cannot be attained, and which are valid.

I noticed that many if-then-else statements are used to determine conditions. In several cases, I noticed that a simple assignment of a boolean expression to a variable would suffice. This sort of thing will likely reduce code size and increase execution speed.

Questions from the kickoff

  1. Does this system only read data from sensors (see p12 of talk - dataflow)? No - it directly sets the relays.

  2. Does it set values on its own or rely of commands from above? It sets the values.

  3. Does this system make only decisions or just carry out orders? It makes decisions.

  4. Is the response time 100ms for detecting an issue?

  5. >> 100ms is the rate by which OB 35 (heart-beat generation) gets called.
    >> The time for detecting (and reacting to) an error conditions is <= twice the OB 1 cycle time ~ 1s right now.
    >> The Pixel DCS have ordered a more performant PLC (CPU319) model to replace the current (CPU315) one.
    >> This upgrade will reduce the time for detecting (and reacting to) an error condition to ~100ms
    >> The PLC to PVSS IO rate is much faster.
    >> (The new PLC model is capable of processing 10 times more instructions per unit of time than the current model).
    

general questions

How are the Temp and Humid sensors calibrated? What accuracy is needed? How do you know you are getting good readings besides checking the overall range?

>> The temperature sensors are calibrated with a precision of better than 0.5 degrees Celsius,
>> a precision which is sufficient for our purposes.
>> The main issue about the "calibration" parameters for temperature sensors is that
>>  a. Siemens provides the temperature in units of centi-degrees, not degrees, so we need to devide by a factor of 100
>>  b. due to space limitations in the inner region of the CMS detector,
>>     we have to use uncompensated 2-wire read-out over a cable length of about 10m.
>>     The cable resistance introduces a <2 degrees shift in the temperature values,
>>     that we correct for with the calibration parameters as well.
>>
>> Lalith Perera calibrated the HMX humidity sensors.
>> The precision of the dew point calibration is about 2C.
>>
>> The connections of sensors to channels of the PLC system is monitored
>> via the PLC Diagnostic functions (OB 8x and SFC 51, SFC 59).

Is the communications reliable? I could not find enough information to know if all the CRC calculations are needed or what guarantees the communication subsystem gives you.

>> Piero Giorgio Verdini and others are actually concerned about the reliability of the S7 driver communication and will be further investigated over time.

The use of OK/ENO and checking data integrity: are these things used or should they be? Looks like OK can be used to check the condition code (status register) that an operation has been carried out successfully.

Looks like there are many data configuration parameter validity checks e.g. in FC402 there is a check that if a channel has been correctly configured. Why is this necessary? Is the system allowed to operate with channels that are wrongly configured? or is this a check if the channel is valid?

>> FC 402 actually checks both:
>> first that the configuration parameters uploaded by PVSS are valid,
>> second that the Siemens Diagnostic functions do not indicate an error 
>> for the channel processed by FC 402.

testing and configuration

Is there a way to run through all possible combinations of input automatically and record all outputs? This might be a way to validate that the code is making correct decisions.

Is there a way to read out the current version of the configuration from the PLC? Are downloaded configurations labeled and versioned? Should the code be restricted at compile time to a particular set of configuration versions and labels?

>> Yes, this is what the read-back settings are intended for.
>> The read-back settings will be archived in the PVSS conditions database (Oracle).
>>
>> The configuration parameters downloaded by PVSS into the PLC are foreseen indeed to be labeled and versioned
>> (in the PVSS configuration database; based on Oracle)
>> This is not implemented yet, but will be implemented in the near future.
>>
>> The configuration parameters are foreseen to be changed **during operation** of the CMS Pixel detector.

Can the development environment/system produce a set of constraints that can be used for building valid configuration parameter sets? Information such as number of channels available and limits?

>> The "constraints" (upper and lower limits which define a validity range) for configuration parameters are defined in DB 2.
>> The configuration parameters uploaded by PVSS are checked for compliance with these limits before they get used by the PLC
>> (the checks are implemented in FC 1xx).

language questions

Whats the difference between block calls and function calls? Which is used? Looks like FC type is used and is recommended in the appendix of the book.

questions and comments about the Word document

Page 2

Are you really worried about accidental overwrites? In what ways can this occur? In what way does this mechanism achieve the protection is claims? Why is it necessary at this low level? Does the higher-level system provide ACLs, integrity checking, and confirmation?

>> We are not really worried. The upgraded PLC will give us the opportunity for additional
>> error checking if it is needed in the future.
>> "Accidental" refers to "in case of a bug in the PLC code".
>> The aim of separating the configuration parameters, operator commands and process values
>> into separate DataBlocks is make the PLC code more robust in case of such a bug.
>>
>> PVSS (as the higher-level system) does not perform checks on the configuration parameters.

My concern is that someone will be maintaining limit rules in two places: where the values are entered and stored in the database, and here in the PLC. I'm not sure what the best breakdown is for this. At some level the configuration building application needs to know the rules and available hardware. At some level the PLC hardware should be sure that it is getting something good.

page 3

final interlock decision - What external system state causes a trip?

>> The external systems are:
>>  o Cooling Plant
>>  o Dry Gas
>>  o Beam radiation monitor
>>
>> Checks which verify that the external systems are operational
>> are performed by the "Tracker Master PLC".
>> The Tracker Master PLC will send a hardwired "kill" signal to the Pixel PLC in case
>> the external systems are not operational, which in turn will interlock the CAEN power supplies.

page 63

Of the set of functions that evaluate and set relays, 308 looks difference than the others. From the description, it looks like it does not fit in this category.

>> FC 308 actually looks pretty similar to FC 301 to me...
>> ...functionally they are very similar: FC 301 evaluates interlock decisions based on "analog" information (temperature and humidity sensors);
>> FC 308 evaluates interlock decisions based on "digital" information (digital input channels).

FC 1240 - how does this work out? Is there any issues associated with doing this action?

>> This function has been removed from the Pixel PLC code.
>> We will use time servers for the synchronization of PLC time and PVSS time in the future
>> (remark: the time servers are still to be setup by Frank Glege).

page 79

Is the testing shown here carried out by hand? Is there a way to automate this sort of testing?

>> Yes, right now the testing is carried out "by hand",
>> using the PLC simulator (PLCSim) tool provided by Siemens.
>>
>> We simply have not found an easy way to automate the testing.
>> This is a common issue for PLC systems of many CMS subdetectors and needs to be discussed in more 
>> detail at a later time together with the CMS Silicon Strip Tracker PLC system we believe.

specific questions

Most of the comments are directly in the Comment Summary pdf file. I did not get a chance to look through FC 301, 308, 1207, and 304. I do want to look through them.

OB35

This appear to be a function invoked by the watchdog interrupt. Why is it called every 100ms when you need a heartbeat signal every 1.6 seconds? Why not program it for a call every 1.6 seconds?

>> Both the 100ms and the 1.6s are the result of an initial choice made long ago. It not considered a final value at the time.
>> In order to extend the lifetime of the Relays, OB 35 is actually now called every 10s and generates a heart-beat with a period of 2 minutes
>> (1 minute Relay open, followed by 1 minute Relay closed).
>> The reason why OB 35 is called every 10 seconds rather then every 1min is that in addition to generating the heart-beat signal OB 35 performs 
>> additional tasks as well for which a 1 minute period would be too long.

I'm trying to learn how configuration works. I've found the following statement:

  Watchdog interrupt handling is set in the Hardware Configuration data when the CPU is parameterized.

Where in the CPU parameterization data? Can we see it? How to you "parameterize the CPU"? I am confused as to how OB35 data gets filled in.

>> The CPU parametrization data is defined using a tool provided by Siemens (called "HW-Config").
>> HW-Config allows to set parameters of the PLC system by means of rather intuitive graphical user interfaces.
>> For example, using HW-Config we have configured OB 35 to be executed every 10 seconds.

Looks like FC104 does some of the work. DB604 is filled by PVSS driver. FC104 looks like it transfers the data from DB604 to DB104.

>> Yes, exactly. This is what its task is.

(I received a response from Christian regarding this issue, but have not yet looked further into it.)

FC402

Why is the result of the BYTE_TO_INT not checked (OK==true)? Should it be?

>> The byte to int conversion cannot fail.
>> BYTE_TO_INT just takes the 8 bits which represent the byte, adds 8 bits worth of zeros
>> and interprets the resulting bit pattern as a 16 bit integer.

If the data used by the BYTE_TO_INT function is really just configuration data, why is it converted on each invocation of this function as it appears to be? why not once at configuration setting time?

>> This would be possible; it would increase the memory consumption, however.
>> (Actually, one would probably use integer rather than byte types already in DB 602 in that case.)

Does the delay parameter exist to prevent the channel from going in and out of alarm too rapidly? If not, what is it used for?

>> The delay parameter is supposed to avoid that the interlocks trigger due to intermittant "spikes"
>> which might be caused by wiggling cables/connectors or bad connections.
>> (We have indeed seen such "spikes" during commissioning at the Tracker Integration Facility at CERN;
>>  remark: the delay feature has been introduced by the CMS Silicon Strip Tracker PLC code.) 

last edited 2008-04-24 23:48:58 by

JimKowalkowski