pacd

SYNOPSIS

pacd [--directory=<dir>] [--logfile=<file>] [--pidfile=<file>] [--umask=<mode>] [--default-bitstream=<file>] [--segment=<PCIeSegment>] [--bus=<bus>] [--device=<device>] [--function=<function>] [--upper-sensor-threshold=<sensor>:<threshold>[:<reset_thresh>]] [--lower-sensor-threshold=<sensor>:<threshold>[:<reset_thresh>]] [--poll-interval <sec>] [--cooldown-interval <sec>] [--no-defaults]

pacd [--default-bitstream=<file>] [--segment=<PCIeSegment>] [--bus=<bus>] [--device=<device>] [--function=<function>] [--upper-sensor-threshold=<sensor>:<threshold>[:<reset_thresh>]] [--lower-sensor-threshold=<sensor>:<threshold>[:<reset_thresh>]] [--poll-interval <sec>] [--cooldown-interval <sec>] [--no-defaults]

DESCRIPTION

pacd periodically monitors the sensors on the Intel® Programmable Acceleration Card (PAC) Board Management Controller (BMC) and programs a default bitstream if a sensor value exceeds a specified threshold. pacd is only available on the PCIe* Accelerator Cards (PACs).

On systems with multiple PACs, pacd monitors the sensors for all cards in the system using the specified sensor threshold values. If you specify the PCIe using -S, -B, -D, -F, pacd monitors all PACs matching the PCIe address components specified. For example, if you specify -B 5 only, pacd monitors all PACs on PCIe bus 5. The sensor thresholds are global. Specifying -T 11:95.0:93.0 monitors sensor 11 on all selected PACs and triggers if its value exceeds 95.0 and resets its trigger at 93.0.

Use SIGINT or SIGTERM to stop pacd, or systemctl to stop pacd if pacd was started as a service, or ^C when run as a regular process.

INSTALLING AS A SYSTEM SERVICE

The tools installation process installs all the necessary files required to make pacd a systemd service, capable of automatically starting on boot if you specify this option.

To start pacd as a systemd service, first edit the file /etc/sysconfig/pacd.conf as root. This file is shown below.

# Intel Programmable Acceleration Card (PAC)
variables.
# Monitors Baseboard Management Controller (BMC) sensors.

############## REQUIRED OPTIONS ################

PIDFile=/tmp/pacd.pid

# Specify default GBS files to consider for PR.  Include '-n' for each.
# ex.: DefaultGBSOptions=-n <Default_GBS_Path> -n <Default_GBS_PATH_2>
DefaultGBSOptions=-n <Default_GBS_Path>
UMask=0
LogFile=/tmp/pacd.log
PollInterval=0
CooldownInterval=0

############## OPTIONAL OPTIONS ################

# Uncomment and specify specific PAC PCI address to monitor.
# Default is to monitor all PACs
#BoardPCIAddr=-S 0 -B 5 -D 0 -F 0

# Specify threshold values. -T for UNR, -t for LNR.
# ex.: ThresholdOptions=-T 4:12.5 -t 7:2.25:2.3
ThresholdOptions=

# Extra advanced options.
# ex.: ExtraOptions=--no-defaults
ExtraOptions=

Edit the DefaultGBSOptions= line, specifying the absolute path(s) of the Accelerator Function (AF) files to be loaded into the device when threshold is exceeded. Prefix each AF file name with -n.

To start the service, first tell systemd to rescan for services using the command sudo systemctl daemon-reload, then issue the command sudo systemctl start pacd. This command starts pacd as a service. It persists until the next boot. To stop the service, use sudu systemctl stop pacd. For pacd to persist across boots, issue sudo systemctl enable pacd; sudo systemctl disable pacd reverses this effect.

To ensure that the service has started, use either the sudo systemctl status pacd -l or sudo journalctl -xe. If you use systemctl, successful startup displays something similar to the following:

sudo systemctl status pacd -l
● pacd.service - PAC BMC sensor monitor
   Loaded: loaded (/usr/lib/systemd/system/pacd.service; disabled; vendor preset: disabled)
   Active: active (running) since Thu 2018-08-23 09:34:59 PDT; 2s ago
  Process: 15694 ExecStart=/usr/local/bin/pacd -d $DefaultGBSOptions -P /usr/local/bin -m $UMask -l $LogFile -p $PIDFile -i $PollInterval -c $CooldownInterval $BoardPCIAddr $ThresholdOptions $ExtraOptions (code=exited, status=0/SUCCESS)
 Main PID: 15698 (pacd)
   CGroup: /system.slice/pacd.service
           └─15698 /usr/local/bin/pacd -d -n /etc/GBSs/default.gbs -P /usr/local/bin -m 0 -l /tmp/pacd.log -p /tmp/pacd.pid -i 0 -c 0

Aug 23 09:34:59 sj-avl-d15-mc.avl systemd[1]: Starting PAC BMC sensor monitor...
Aug 23 09:34:59 sj-avl-d15-mc.avl pacd[15694]: Thu Aug 23 09:34:59 2018: registering default bitstream "/etc/GBSs/default.gbs"
Aug 23 09:34:59 sj-avl-d15-mc.avl pacd[15694]: Thu Aug 23 09:34:59 2018: daemon path is /usr/local/bin
Aug 23 09:34:59 sj-avl-d15-mc.avl pacd[15694]: Thu Aug 23 09:34:59 2018: daemon umask is 0x0
Aug 23 09:34:59 sj-avl-d15-mc.avl pacd[15694]: Thu Aug 23 09:34:59 2018: daemon log file is /tmp/pacd.log
Aug 23 09:34:59 sj-avl-d15-mc.avl pacd[15694]: Thu Aug 23 09:34:59 2018: daemon pid file is /tmp/pacd.pid
Aug 23 09:34:59 sj-avl-d15-mc.avl pacd[15694]: Thu Aug 23 09:34:59 2018: Polling interval set to 0.000000 sec
Aug 23 09:34:59 sj-avl-d15-mc.avl pacd[15694]: Thu Aug 23 09:34:59 2018: Cooldown delay set to 0.000000 sec
Aug 23 09:34:59 sj-avl-d15-mc.avl systemd[1]: Started PAC BMC sensor monitor.

The journalctl output is similar to the following:

Aug 23 09:34:59 sj-avl-d15-mc.avl systemd[1]: Starting PAC BMC sensor monitor...
-- Subject: Unit pacd.service has begun start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit pacd.service has begun starting up.
Aug 23 09:34:59 sj-avl-d15-mc.avl pacd[15694]: Thu Aug 23 09:34:59 2018: registering NULL bitstream "/etc/GBSs/NULL.gbs"
Aug 23 09:34:59 sj-avl-d15-mc.avl pacd[15694]: Thu Aug 23 09:34:59 2018: daemon path is /usr/local/bin
Aug 23 09:34:59 sj-avl-d15-mc.avl pacd[15694]: Thu Aug 23 09:34:59 2018: daemon umask is 0x0
Aug 23 09:34:59 sj-avl-d15-mc.avl pacd[15694]: Thu Aug 23 09:34:59 2018: daemon log file is /tmp/pacd.log
Aug 23 09:34:59 sj-avl-d15-mc.avl pacd[15694]: Thu Aug 23 09:34:59 2018: daemon pid file is /tmp/pacd.pid
Aug 23 09:34:59 sj-avl-d15-mc.avl pacd[15694]: Thu Aug 23 09:34:59 2018: Polling interval set to 0.000000 sec
Aug 23 09:34:59 sj-avl-d15-mc.avl pacd[15694]: Thu Aug 23 09:34:59 2018: Cooldown delay set to 0.000000 sec
Aug 23 09:34:59 sj-avl-d15-mc.avl systemd[1]: Started PAC BMC sensor monitor.
-- Subject: Unit pacd.service has finished start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit pacd.service has finished starting up.
--
-- The start-up result is done.

OPTIONS

-d, --daemon

This argument has been deprecated. The pacd command treats it as a no-op.

-P, --directory <dir>

Run from the specified directory (path).

-l, --logfile <file>

Send output to file. If you specify a log file on the command line, pacd writes to both that logfile and stdout, stderr, or syslog depending on whether ‘pacd’ started from the command line or a systemd service.

-p, --pidfile <file>

This argument has been deprecated. The pacd command treats it as a no-op.

-m, --umask <mode>

Use the mode value as the file mode creation mask passed to umask.

-i, --poll-interval <secs>

pacd polls and checks the sensor values every secs seconds. This is a real number, meaning you can specify a floating-point number such as 2.5 for two-and-a-half second poll interval.

-c, --cooldown-interval <secs>

Specifies the time in seconds that pacd waits after removing the FPGA driver before re-enabling the driver. The host is not able to access the PAC for this time period for any reason.

-n, --default-bitstream <file>

Specify the default bitstream to program when a sensor value exceeds the specified threshold. This option may be specified multiple times. The AF, if any, that matches the FPGA’s PR interface ID is programmed when the sensor’s value exceeds the threshold.

-S, --segment <PCIe segment>

Specify the PCIe segment (domain) of the PAC of interest.

-B, --bus <PCIe bus>

Specify the PCIe bus of the PAC of interest.

-D, --device <PCIe device>

Specify the PCIe device of the PAC of interest.

-F, --function <PCIe function>

Specify the PCIe function of the PAC of interest.

-T, --upper-sensor-threshold <sensor>:<trigger_threshold>[:<reset_threshold>]

Specify the threshold value for a sensor that, when exceeded (sensor value strictly greater than <trigger_threshold>), causes the default bitstream specified with -n that matches the FPGA’s PR Interface ID to be programmed into the FPGA. The sensor is considered triggered (and no PR performed) until its value drops below <reset_threshold>.

You can specify this option multiple times.

The sensors specified are monitored for all specified PACs. There is no mechanism for specifying per-PAC sensor thresholds.

-t, --lower-sensor-threshold <sensor>:<trigger_threshold>[:<reset_threshold>]

Specify the threshold value for a sensor that, when exceeded (sensor value strictly less than <trigger_threshold>), causes the default bitstream specified with -n that matches the FPGA’s PR Interface ID to be programmed into the FPGA. The sensor is considered triggered (and no PR performed) until its value goes above <reset_threshold>.

You can specify this option multiple times.

The sensors specified are monitored for all specified PACs. There is no mechanism for specifying per-PAC sensor thresholds.

-N, --no-defaults

By default pacd monitors the same set of sensors that the BMC monitors. These are sensors that could trigger a machine re-boot. This set is typically all settable, non-recoverable thresholds. Specifying this option tells pacd not to monitor these sensors. This option requires at least one of -T or -t to be specified.

NOTES

pacd is intended to prevent an over-temperature or power “non-recoverable” event from causing the FPGA’s BMC to shut down the PAC. Shutting down the PAC results in a PCIe “surprise removal” which ultimately causes the host to reboot.

The application being accelerated must be able to respond appropriately when the device driver disappears from the system. The application receives a SIGHUP signal when the driver shuts itself down. On receipt of SIGHUP, the application should clean up and exit as soon as possible.

TROUBLESHOOTING

If you encounter any issues, you can get debug information examining stdout and the system logs, journalctl or dmesg.

EXAMPLES

The following command starts pacd as a regular process, programming idle.gbs when sensor 11 (FPGA Core TEMP) exceeds 92.35 degrees C or sensor 0 (Total Input Power) goes out of the range [9.2 - 19.9] Watts.

pacd -n=idle.gbs -T 11:92.35 -T 0:19.9 -t 0:9.2

Revision History

Document Version Changes
2019.05.13 Made the following changes: Removed the daemon argument. Removed the driver-re moval-disab le argument. Removed description s of three problems that are fixed in the current release.
2018.08.17 Updated to include new options.
2018.08.08 Initial revision.