pacd¶
SYNOPSIS¶
pacd --daemon [--directory=<dir>] [--logfile=<file>] [--pidfile=<file>] [--umask=<mode>] [--default-bitstream=<file>] [--segment=<PCIeSegment>] [--bus=<bus>] [--device=<device>] [--function=<function>] [--upper-sensor-threshold=<sensor>:<threshold>[:<reset_thresh>]] [--lower-sensor-threshold=<sensor>:<threshold>[:<reset_thresh>]] [--poll-interval <sec>] [--cooldown-interval <sec>] [--no-defaults] [--driver-removal-disable]
pacd [--default-bitstream=<file>] [--segment=<PCIeSegment>] [--bus=<bus>] [--device=<device>] [--function=<function>] [--upper-sensor-threshold=<sensor>:<threshold>[:<reset_thresh>]] [--lower-sensor-threshold=<sensor>:<threshold>[:<reset_thresh>]] [--poll-interval <sec>] [--cooldown-interval <sec>] [--no-defaults] [--driver-removal-disable]
DESCRIPTION¶
pacd
periodically monitors the sensors on the Intel Intel®
Programmable Acceleration Card (PAC) Board Management Controller (BMC)
and programs a default bitstream in response to a sensor’s value
exceeding a specified threshold. pacd is only available on the PCIe*
Accelerator Card (PAC).
On systems with multiple PACs, pacd
will monitor the sensors for all
cards in the system using the specified sensor threshold values. If the
PCIe address is specified (i.e., -S
, -B
, -D
, -F
),
pacd
will monitor all PACs matching the PCIe address components
specified. For example, if the user specifies -B 5
only, all PACs on
PCIe bus 5
will be monitored. The sensor thresholds are global, so
specifying -T 11:95.0:93.0
will monitor sensor 11
on all
selected PACs and trigger if its value exceeds 95.0
and reset its
trigger at 93.0
.
Use SIGINT or SIGTERM to stop pacd
, either in daemon mode
(kill -2 `cat /tmp/pacd.pid`
or kill -15 `cat /tmp/pacd.pid`
) or
^C
when run as a regular process.
INSTALLING AS A SYSTEM SERVICE¶
The tools installation process will install all the necessary files
required to make pacd
a systemd
service, capable of
automatically starting on boot if desired.
In order to start pacd
as a systemd
service, first edit the file
/etc/sysconfig/pacd.conf
as root. This file is shown below.
# Intel Programmable Acceleration Card (PAC) daemon variables.
# Monitors Baseboard Management Controller (BMC) sensors.
############## REQUIRED OPTIONS ################
PIDFile=/tmp/pacd.pid
# Specify default GBS files to consider for PR. Include '-n' for each.
# ex.: DefaultGBSOptions=-n <Default_GBS_Path> -n <Default_GBS_PATH_2>
DefaultGBSOptions=-n <Default_GBS_Path>
UMask=0
LogFile=/tmp/pacd.log
PollInterval=0
CooldownInterval=0
############## OPTIONAL OPTIONS ################
# Uncomment and specify specific PAC PCI address to monitor.
# Default is to monitor all PACs
#BoardPCIAddr=-S 0 -B 5 -D 0 -F 0
# Specify threshold values. -T for UNR, -t for LNR.
# ex.: ThresholdOptions=-T 4:12.5 -t 7:2.25:2.3
ThresholdOptions=
# Extra advanced options.
# ex.: ExtraOptions=--no-defaults --driver-removal-disable
ExtraOptions=
Edit the DefaultGBSOptions=
line, specifying the absolute path(s) of
the GBS files to be loaded into the device when a threshold has been
exceeded. Prefix each GBS file name with -n
.
To start the service, first tell systemd
to rescan for services
using the command sudo systemctl daemon-reload
, then issue the
command sudu systemctl start pacd
. This will start pacd
as a
service, and it will persist until the next boot. To stop the service,
use sudu systemctl stop pacd
. In order for pacd
to persist
across boots, issue sudo systemctl enable pacd
;
sudo systemctl disable pacd
will reverse this effect.
To ensure that the service has been started, use either the
sudo systemctl status pacd -l
or sudo journalctl -xe
. Using
systemctl
, successful startup will display something similar to the
following:
sudo systemctl status pacd -l
● pacd.service - PAC BMC sensor monitor
Loaded: loaded (/usr/lib/systemd/system/pacd.service; disabled; vendor preset: disabled)
Active: active (running) since Thu 2018-08-23 09:34:59 PDT; 2s ago
Process: 15694 ExecStart=/usr/local/bin/pacd -d $DefaultGBSOptions -P /usr/local/bin -m $UMask -l $LogFile -p $PIDFile -i $PollInterval -c $CooldownInterval $BoardPCIAddr $ThresholdOptions $ExtraOptions (code=exited, status=0/SUCCESS)
Main PID: 15698 (pacd)
CGroup: /system.slice/pacd.service
└─15698 /usr/local/bin/pacd -d -n /etc/GBSs/default.gbs -P /usr/local/bin -m 0 -l /tmp/pacd.log -p /tmp/pacd.pid -i 0 -c 0
Aug 23 09:34:59 sj-avl-d15-mc.avl systemd[1]: Starting PAC BMC sensor monitor...
Aug 23 09:34:59 sj-avl-d15-mc.avl pacd[15694]: Thu Aug 23 09:34:59 2018: daemon requested
Aug 23 09:34:59 sj-avl-d15-mc.avl pacd[15694]: Thu Aug 23 09:34:59 2018: registering default bitstream "/etc/GBSs/default.gbs"
Aug 23 09:34:59 sj-avl-d15-mc.avl pacd[15694]: Thu Aug 23 09:34:59 2018: daemon path is /usr/local/bin
Aug 23 09:34:59 sj-avl-d15-mc.avl pacd[15694]: Thu Aug 23 09:34:59 2018: daemon umask is 0x0
Aug 23 09:34:59 sj-avl-d15-mc.avl pacd[15694]: Thu Aug 23 09:34:59 2018: daemon log file is /tmp/pacd.log
Aug 23 09:34:59 sj-avl-d15-mc.avl pacd[15694]: Thu Aug 23 09:34:59 2018: daemon pid file is /tmp/pacd.pid
Aug 23 09:34:59 sj-avl-d15-mc.avl pacd[15694]: Thu Aug 23 09:34:59 2018: Polling interval set to 0.000000 sec
Aug 23 09:34:59 sj-avl-d15-mc.avl pacd[15694]: Thu Aug 23 09:34:59 2018: Cooldown delay set to 0.000000 sec
Aug 23 09:34:59 sj-avl-d15-mc.avl systemd[1]: Started PAC BMC sensor monitor.
The journalctl
output will look similar to:
Aug 23 09:34:59 sj-avl-d15-mc.avl systemd[1]: Starting PAC BMC sensor monitor...
-- Subject: Unit pacd.service has begun start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit pacd.service has begun starting up.
Aug 23 09:34:59 sj-avl-d15-mc.avl pacd[15694]: Thu Aug 23 09:34:59 2018: daemon requested
Aug 23 09:34:59 sj-avl-d15-mc.avl pacd[15694]: Thu Aug 23 09:34:59 2018: registering NULL bitstream "/etc/GBSs/NULL.gbs"
Aug 23 09:34:59 sj-avl-d15-mc.avl pacd[15694]: Thu Aug 23 09:34:59 2018: daemon path is /usr/local/bin
Aug 23 09:34:59 sj-avl-d15-mc.avl pacd[15694]: Thu Aug 23 09:34:59 2018: daemon umask is 0x0
Aug 23 09:34:59 sj-avl-d15-mc.avl pacd[15694]: Thu Aug 23 09:34:59 2018: daemon log file is /tmp/pacd.log
Aug 23 09:34:59 sj-avl-d15-mc.avl pacd[15694]: Thu Aug 23 09:34:59 2018: daemon pid file is /tmp/pacd.pid
Aug 23 09:34:59 sj-avl-d15-mc.avl pacd[15694]: Thu Aug 23 09:34:59 2018: Polling interval set to 0.000000 sec
Aug 23 09:34:59 sj-avl-d15-mc.avl pacd[15694]: Thu Aug 23 09:34:59 2018: Cooldown delay set to 0.000000 sec
Aug 23 09:34:59 sj-avl-d15-mc.avl systemd[1]: Started PAC BMC sensor monitor.
-- Subject: Unit pacd.service has finished start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit pacd.service has finished starting up.
--
-- The start-up result is done.
OPTIONS¶
-d, --daemon
When specified, pacd
executes as a system daemon process.
-P, --directory <dir>
When running in daemon mode, run from the specified directory (path). If
omitted when daemonizing, /tmp
is used.
-l, --logfile <file>
When running in daemon mode, send output to file. When not in daemon
mode, the output goes to stdout. If omitted when daemonizing,
/tmp/pacd.log
is used.
-p, --pidfile <file>
When running in daemon mode, write the daemon’s process id to a file. If
omitted when daemonizing, /tmp/pacd.pid
is used.
-m, --umask <mode>
When running in daemon mode, use the mode value as the file mode
creation mask passed to umask. If omitted when daemonizing, 0
is
used.
-i, --poll-interval <secs>
pacd
will poll and check the sensor values every secs
seconds.
This is a real number, so a floating-point number can be specified
(i.e., 2.5
for two and a half second poll interval).
-c, --cooldown-interval <secs>
Specifies the time in seconds that pacd
will wait after removing the
FPGA driver before re-enabling the driver. This is the time that the
host will not be able to access the PAC for any reason. Not valid in
conjunction with --driver-removal-disable
.
-n, --default-bitstream <file>
Specify the default bitstream to program when a sensor value exceeds the specified threshold. This option may be specified multiple times. The AF, if any, that matches the FPGA’s PR interface ID is programmed when the sensor’s value exceeds the threshold.
-S, --segment <PCIe segment>
Specify the PCIe segment (domain) of the PAC of interest.
-B, --bus <PCIe bus>
Specify the PCIe bus of the PAC of interest.
-D, --device <PCIe device>
Specify the PCIe device of the PAC of interest.
-F, --function <PCIe function>
Specify the PCIe function of the PAC of interest.
-T, --upper-sensor-threshold <sensor>:<trigger_threshold>[:<reset_threshold>]
Specify the threshold value for a sensor that, when exceeded (sensor
value strictly greater than <trigger_threshold>
), will cause the
default bitstream specified with -n
that matches the FPGA’s PR
Interface ID to be programmed into the FPGA. The sensor will be
considered triggered (and no PR performed) until its value drops below
<reset_threshold>
.
This option can be specified multiple times.
The sensors specified will be monitored for all specified PACs. There is no mechanism for specifying per-PAC sensor thresholds.
-t, --lower-sensor-threshold <sensor>:<trigger_threshold>[:<reset_threshold>]
Specify the threshold value for a sensor that, when exceeded (sensor
value strictly less than <trigger_threshold>
), will cause the
default bitstream specified with -n
that matches the FPGA’s PR
Interface ID to be programmed into the FPGA. The sensor will be
considered triggered (and no PR performed) until its value goes above
<reset_threshold>
.
This option can be specified multiple times.
The sensors specified will be monitored for all specified PACs. There is no mechanism for specifying per-PAC sensor thresholds.
-N, --no-defaults
pacd
will by default monitor the same set of sensors that the BMC
monitors that could trigger a machine re-boot. This set is typically all
settable non-recoverable thresholds. Specifying this option tells
pacd
not to monitor these sensors. This option requires at least one
of -T
or -t
to be specified.
--driver-removal-disable
This is an advanced option with the default being to disable the driver.
When a sensor is initially tripped requiring a PR of the FPGA, pacd
will remove the FPGA device driver for the device, wait for a period of
time, re-enable the driver and then PR the default bitstream into the
device.
If this option is specified, pacd
will skip disabling the driver and
just PR the default bitstream into the device.
NOTES¶
pacd
is intended to prevent an over-temperature or power
“non-recoverable” event from causing the FPGA’s BMC to shut down the
PAC. Shutting down the PAC results in a PCIe “surprise removal” which
will cause the host to ultimately reboot.
There are several issues that need to be taken into consideration when
enabling pacd
:
- The application being accelerated needs to be able to respond appropriately when the device driver disappears from the system. The application will receive a SIGHUP signal when the driver shuts itself down. On receipt of SIGHUP, the app should clean up and exit as soon as possible.
- There is a window in which the running system will reboot if a PR is in progress when a sensor’s threshold is tripped.
- The OS and driver cannot invalidate any pointers that the application has to FPGA MMIO space. Utilization of the OPAE API to access the MMIO region is strongly recommended to avoid unanticipated reboots.
- The OS and driver cannot prevent direct access of host memory from
the FPGA, as in the case of a DMA operation from the AFU to the host.
There is a high probability of a reboot if a PR is attempted by
pacd
due to a threshold trip event during a DMA operation.
TROUBLESHOOTING¶
If you encounter any issues, you can get debug information in two ways:
- By examining the log file when in daemon mode.
- By running in non-daemon mode and viewing stdout.
EXAMPLES¶
The following command will start pacd
as a daemon process,
programming my_null_bits.gbs
when any BMC-triggerable threshold is
tripped.
pacd --daemon --null-bitstream=my_null_bits.gbs
The following command will start pacd
as a regular process,
programming idle.gbs
when sensor 11 (FPGA Core TEMP) exceeds 92.35
degrees C or sensor 0 (Total Input Power) goes out of the range [9.2 -
19.9] Watts.
pacd -n=idle.gbs -T 11:92.35 -T 0:19.9 -t 0:9.2
Revision History¶
Document Version | Intel Acceleration Stack Version | Changes |
---|---|---|
2018.08.08 | 1.2 Beta. (Supported with Intel Quartus Prime Pro Edition 17.1.) | Initial revision. |
2018.08.17 | 1.2 Beta. (Supported with Intel Quartus Prime Pro Edition 17.1.) | Updated to include new options. |