# pacd # ## SYNOPSIS ## `pacd --daemon [--directory=] [--logfile=] [--pidfile=] [--umask=] [--default-bitstream=] [--segment=] [--bus=] [--device=] [--function=] [--upper-sensor-threshold=:[:]] [--lower-sensor-threshold=:[:]] [--poll-interval ] [--cooldown-interval ] [--no-defaults] [--driver-removal-disable]` `pacd [--default-bitstream=] [--segment=] [--bus=] [--device=] [--function=] [--upper-sensor-threshold=:[:]] [--lower-sensor-threshold=:[:]] [--poll-interval ] [--cooldown-interval ] [--no-defaults] [--driver-removal-disable]` ## DESCRIPTION ## `pacd` periodically monitors the sensors on the Intel Intel® Programmable Acceleration Card (PAC) Board Management Controller (BMC) and programs a default bitstream in response to a sensor's value exceeding a specified threshold. pacd is only available on the PCIe\* Accelerator Card (PAC). On systems with multiple PACs, `pacd` will monitor the sensors for all cards in the system using the specified sensor threshold values. If the PCIe address is specified (i.e., `-S`, `-B`, `-D`, `-F`), `pacd` will monitor all PACs matching the PCIe address components specified. For example, if the user specifies `-B 5` only, all PACs on PCIe bus `5` will be monitored. The sensor thresholds are global, so specifying `-T 11:95.0:93.0` will monitor sensor `11` on all selected PACs and trigger if its value exceeds `95.0` and reset its trigger at `93.0`. Use SIGINT or SIGTERM to stop `pacd`, either in daemon mode (``kill -2 `cat /tmp/pacd.pid` `` or ``kill -15 `cat /tmp/pacd.pid` ``) or `^C` when run as a regular process. ## INSTALLING AS A SYSTEM SERVICE ## The tools installation process will install all the necessary files required to make `pacd` a `systemd` service, capable of automatically starting on boot if desired. In order to start `pacd` as a `systemd` service, first edit the file `/etc/sysconfig/pacd.conf` as root. This file is shown below. ``` # Intel Programmable Acceleration Card (PAC) daemon variables. # Monitors Baseboard Management Controller (BMC) sensors. ############## REQUIRED OPTIONS ################ PIDFile=/tmp/pacd.pid # Specify default GBS files to consider for PR. Include '-n' for each. # ex.: DefaultGBSOptions=-n -n DefaultGBSOptions=-n UMask=0 LogFile=/tmp/pacd.log PollInterval=0 CooldownInterval=0 ############## OPTIONAL OPTIONS ################ # Uncomment and specify specific PAC PCI address to monitor. # Default is to monitor all PACs #BoardPCIAddr=-S 0 -B 5 -D 0 -F 0 # Specify threshold values. -T for UNR, -t for LNR. # ex.: ThresholdOptions=-T 4:12.5 -t 7:2.25:2.3 ThresholdOptions= # Extra advanced options. # ex.: ExtraOptions=--no-defaults --driver-removal-disable ExtraOptions= ``` Edit the `DefaultGBSOptions=` line, specifying the absolute path(s) of the GBS files to be loaded into the device when a threshold has been exceeded. Prefix each GBS file name with `-n`. To start the service, first tell `systemd` to rescan for services using the command `sudo systemctl daemon-reload`, then issue the command `sudu systemctl start pacd`. This will start `pacd` as a service, and it will persist until the next boot. To stop the service, use `sudu systemctl stop pacd`. In order for `pacd` to persist across boots, issue `sudo systemctl enable pacd`; `sudo systemctl disable pacd` will reverse this effect. To ensure that the service has been started, use either the `sudo systemctl status pacd -l` or `sudo journalctl -xe`. Using `systemctl`, successful startup will display something similar to the following: ``` sudo systemctl status pacd -l ● pacd.service - PAC BMC sensor monitor Loaded: loaded (/usr/lib/systemd/system/pacd.service; disabled; vendor preset: disabled) Active: active (running) since Thu 2018-08-23 09:34:59 PDT; 2s ago Process: 15694 ExecStart=/usr/local/bin/pacd -d $DefaultGBSOptions -P /usr/local/bin -m $UMask -l $LogFile -p $PIDFile -i $PollInterval -c $CooldownInterval $BoardPCIAddr $ThresholdOptions $ExtraOptions (code=exited, status=0/SUCCESS) Main PID: 15698 (pacd) CGroup: /system.slice/pacd.service └─15698 /usr/local/bin/pacd -d -n /etc/GBSs/default.gbs -P /usr/local/bin -m 0 -l /tmp/pacd.log -p /tmp/pacd.pid -i 0 -c 0 Aug 23 09:34:59 sj-avl-d15-mc.avl systemd[1]: Starting PAC BMC sensor monitor... Aug 23 09:34:59 sj-avl-d15-mc.avl pacd[15694]: Thu Aug 23 09:34:59 2018: daemon requested Aug 23 09:34:59 sj-avl-d15-mc.avl pacd[15694]: Thu Aug 23 09:34:59 2018: registering default bitstream "/etc/GBSs/default.gbs" Aug 23 09:34:59 sj-avl-d15-mc.avl pacd[15694]: Thu Aug 23 09:34:59 2018: daemon path is /usr/local/bin Aug 23 09:34:59 sj-avl-d15-mc.avl pacd[15694]: Thu Aug 23 09:34:59 2018: daemon umask is 0x0 Aug 23 09:34:59 sj-avl-d15-mc.avl pacd[15694]: Thu Aug 23 09:34:59 2018: daemon log file is /tmp/pacd.log Aug 23 09:34:59 sj-avl-d15-mc.avl pacd[15694]: Thu Aug 23 09:34:59 2018: daemon pid file is /tmp/pacd.pid Aug 23 09:34:59 sj-avl-d15-mc.avl pacd[15694]: Thu Aug 23 09:34:59 2018: Polling interval set to 0.000000 sec Aug 23 09:34:59 sj-avl-d15-mc.avl pacd[15694]: Thu Aug 23 09:34:59 2018: Cooldown delay set to 0.000000 sec Aug 23 09:34:59 sj-avl-d15-mc.avl systemd[1]: Started PAC BMC sensor monitor. ``` The `journalctl` output will look similar to: ``` Aug 23 09:34:59 sj-avl-d15-mc.avl systemd[1]: Starting PAC BMC sensor monitor... -- Subject: Unit pacd.service has begun start-up -- Defined-By: systemd -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel -- -- Unit pacd.service has begun starting up. Aug 23 09:34:59 sj-avl-d15-mc.avl pacd[15694]: Thu Aug 23 09:34:59 2018: daemon requested Aug 23 09:34:59 sj-avl-d15-mc.avl pacd[15694]: Thu Aug 23 09:34:59 2018: registering NULL bitstream "/etc/GBSs/NULL.gbs" Aug 23 09:34:59 sj-avl-d15-mc.avl pacd[15694]: Thu Aug 23 09:34:59 2018: daemon path is /usr/local/bin Aug 23 09:34:59 sj-avl-d15-mc.avl pacd[15694]: Thu Aug 23 09:34:59 2018: daemon umask is 0x0 Aug 23 09:34:59 sj-avl-d15-mc.avl pacd[15694]: Thu Aug 23 09:34:59 2018: daemon log file is /tmp/pacd.log Aug 23 09:34:59 sj-avl-d15-mc.avl pacd[15694]: Thu Aug 23 09:34:59 2018: daemon pid file is /tmp/pacd.pid Aug 23 09:34:59 sj-avl-d15-mc.avl pacd[15694]: Thu Aug 23 09:34:59 2018: Polling interval set to 0.000000 sec Aug 23 09:34:59 sj-avl-d15-mc.avl pacd[15694]: Thu Aug 23 09:34:59 2018: Cooldown delay set to 0.000000 sec Aug 23 09:34:59 sj-avl-d15-mc.avl systemd[1]: Started PAC BMC sensor monitor. -- Subject: Unit pacd.service has finished start-up -- Defined-By: systemd -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel -- -- Unit pacd.service has finished starting up. -- -- The start-up result is done. ``` ## OPTIONS ## `-d, --daemon` When specified, `pacd` executes as a system daemon process. `-P, --directory ` When running in daemon mode, run from the specified directory (path). If omitted when daemonizing, `/tmp` is used. `-l, --logfile ` When running in daemon mode, send output to file. When not in daemon mode, the output goes to stdout. If omitted when daemonizing, `/tmp/pacd.log` is used. `-p, --pidfile ` When running in daemon mode, write the daemon's process id to a file. If omitted when daemonizing, `/tmp/pacd.pid` is used. `-m, --umask ` When running in daemon mode, use the mode value as the file mode creation mask passed to umask. If omitted when daemonizing, `0` is used. `-i, --poll-interval ` `pacd` will poll and check the sensor values every `secs` seconds. This is a real number, so a floating-point number can be specified (i.e., `2.5` for two and a half second poll interval). `-c, --cooldown-interval ` Specifies the time in seconds that `pacd` will wait after removing the FPGA driver before re-enabling the driver. This is the time that the host will not be able to access the PAC for any reason. Not valid in conjunction with `--driver-removal-disable`. `-n, --default-bitstream ` Specify the default bitstream to program when a sensor value exceeds the specified threshold. This option may be specified multiple times. The AF, if any, that matches the FPGA's PR interface ID is programmed when the sensor's value exceeds the threshold. `-S, --segment ` Specify the PCIe segment (domain) of the PAC of interest. `-B, --bus ` Specify the PCIe bus of the PAC of interest. `-D, --device ` Specify the PCIe device of the PAC of interest. `-F, --function ` Specify the PCIe function of the PAC of interest. `-T, --upper-sensor-threshold :[:]` Specify the threshold value for a sensor that, when exceeded (sensor value strictly greater than ``), will cause the default bitstream specified with `-n` that matches the FPGA's PR Interface ID to be programmed into the FPGA. The sensor will be considered triggered (and no PR performed) until its value drops below ``. This option can be specified multiple times. The sensors specified will be monitored for all specified PACs. There is no mechanism for specifying per-PAC sensor thresholds. `-t, --lower-sensor-threshold :[:]` Specify the threshold value for a sensor that, when exceeded (sensor value strictly less than ``), will cause the default bitstream specified with `-n` that matches the FPGA's PR Interface ID to be programmed into the FPGA. The sensor will be considered triggered (and no PR performed) until its value goes above ``. This option can be specified multiple times. The sensors specified will be monitored for all specified PACs. There is no mechanism for specifying per-PAC sensor thresholds. `-N, --no-defaults` `pacd` will by default monitor the same set of sensors that the BMC monitors that could trigger a machine re-boot. This set is typically all settable non-recoverable thresholds. Specifying this option tells `pacd` not to monitor these sensors. This option requires at least one of `-T` or `-t` to be specified. `--driver-removal-disable` This is an advanced option with the default being to disable the driver. When a sensor is initially tripped requiring a PR of the FPGA, `pacd` will remove the FPGA device driver for the device, wait for a period of time, re-enable the driver and then PR the default bitstream into the device. If this option is specified, `pacd` will skip disabling the driver and just PR the default bitstream into the device. ## NOTES ## `pacd` is intended to prevent an over-temperature or power "non-recoverable" event from causing the FPGA's BMC to shut down the PAC. Shutting down the PAC results in a PCIe "surprise removal" which will cause the host to ultimately reboot. There are several issues that need to be taken into consideration when enabling `pacd`: 1. The application being accelerated needs to be able to respond appropriately when the device driver disappears from the system. The application will receive a SIGHUP signal when the driver shuts itself down. On receipt of SIGHUP, the app should clean up and exit as soon as possible. 2. There is a window in which the running system will reboot if a PR is in progress when a sensor's threshold is tripped. 3. The OS and driver cannot invalidate any pointers that the application has to FPGA MMIO space. Utilization of the OPAE API to access the MMIO region is strongly recommended to avoid unanticipated reboots. 4. The OS and driver cannot prevent direct access of host memory from the FPGA, as in the case of a DMA operation from the AFU to the host. There is a high probability of a reboot if a PR is attempted by `pacd` due to a threshold trip event during a DMA operation. ## TROUBLESHOOTING ## If you encounter any issues, you can get debug information in two ways: 1. By examining the log file when in daemon mode. 2. By running in non-daemon mode and viewing stdout. ## EXAMPLES ## The following command will start `pacd` as a daemon process, programming `my_null_bits.gbs` when any BMC-triggerable threshold is tripped. `pacd --daemon --null-bitstream=my_null_bits.gbs` The following command will start `pacd` as a regular process, programming `idle.gbs` when sensor 11 (FPGA Core TEMP) exceeds 92.35 degrees C or sensor 0 (Total Input Power) goes out of the range [9.2 - 19.9] Watts. `pacd -n=idle.gbs -T 11:92.35 -T 0:19.9 -t 0:9.2` ## Revision History ## | Document Version | Intel Acceleration Stack Version | Changes | | ---------------- |------------------------------------|----------| | 2018.08.08 | 1.2 Beta.
(Supported with Intel Quartus Prime Pro Edition 17.1.) | Initial revision. | | 2018.08.17 | 1.2 Beta.
(Supported with Intel Quartus Prime Pro Edition 17.1.) | Updated to include new options. |