pacd¶
SYNOPSIS¶
pacd [--directory=<dir>] [--logfile=<file>] [--pidfile=<file>] [--umask=<mode>] [--default-bitstream=<file>] [--segment=<PCIeSegment>] [--bus=<bus>] [--device=<device>] [--function=<function>] [--upper-sensor-threshold=<sensor>:<threshold>[:<reset_thresh>]] [--lower-sensor-threshold=<sensor>:<threshold>[:<reset_thresh>]] [--poll-interval <sec>] [--cooldown-interval <sec>] [--no-defaults]
pacd [--default-bitstream=<file>] [--segment=<PCIeSegment>] [--bus=<bus>] [--device=<device>] [--function=<function>] [--upper-sensor-threshold=<sensor>:<threshold>[:<reset_thresh>]] [--lower-sensor-threshold=<sensor>:<threshold>[:<reset_thresh>]] [--poll-interval <sec>] [--cooldown-interval <sec>] [--no-defaults]
DESCRIPTION¶
pacd
periodically monitors the sensors on the Intel® Programmable
Acceleration Card (PAC) Board Management Controller (BMC) and programs a
default bitstream if a sensor value exceeds a specified threshold.
pacd
is only available on the PCIe* Accelerator Cards (PACs).
On systems with multiple PACs, pacd
monitors the sensors for all
cards in the system using the specified sensor threshold values. If you
specify the PCIe using -S
, -B
, -D
, -F
, pacd
monitors
all PACs matching the PCIe address components specified. For example, if
you specify -B 5
only, pacd
monitors all PACs on PCIe bus 5
.
The sensor thresholds are global. Specifying -T 11:95.0:93.0
monitors sensor 11
on all selected PACs and triggers if its value
exceeds 95.0
and resets its trigger at 93.0
.
Use SIGINT
or SIGTERM
to stop pacd
, or systemctl
to stop
pacd
if pacd
was started as a service, or ^C
when run as a
regular process.
INSTALLING AS A SYSTEM SERVICE¶
The tools installation process installs all the necessary files required
to make pacd
a systemd
service, capable of automatically
starting on boot if you specify this option.
To start pacd
as a systemd
service, first edit the file
/etc/sysconfig/pacd.conf
as root. This file is shown below.
# Intel Programmable Acceleration Card (PAC)
variables.
# Monitors Baseboard Management Controller (BMC) sensors.
############## REQUIRED OPTIONS ################
PIDFile=/tmp/pacd.pid
# Specify default GBS files to consider for PR. Include '-n' for each.
# ex.: DefaultGBSOptions=-n <Default_GBS_Path> -n <Default_GBS_PATH_2>
DefaultGBSOptions=-n <Default_GBS_Path>
UMask=0
LogFile=/tmp/pacd.log
PollInterval=0
CooldownInterval=0
############## OPTIONAL OPTIONS ################
# Uncomment and specify specific PAC PCI address to monitor.
# Default is to monitor all PACs
#BoardPCIAddr=-S 0 -B 5 -D 0 -F 0
# Specify threshold values. -T for UNR, -t for LNR.
# ex.: ThresholdOptions=-T 4:12.5 -t 7:2.25:2.3
ThresholdOptions=
# Extra advanced options.
# ex.: ExtraOptions=--no-defaults
ExtraOptions=
Edit the DefaultGBSOptions=
line, specifying the absolute path(s) of
the Accelerator Function (AF) files to be loaded into the device when
threshold is exceeded. Prefix each AF file name with -n
.
To start the service, first tell systemd
to rescan for services
using the command sudo systemctl daemon-reload
, then issue the
command sudo systemctl start pacd
. This command starts pacd
as a
service. It persists until the next boot. To stop the service, use
sudu systemctl stop pacd
. For pacd
to persist across boots,
issue sudo systemctl enable pacd
; sudo systemctl disable pacd
reverses this effect.
To ensure that the service has started, use either the
sudo systemctl status pacd -l
or sudo journalctl -xe
. If you use
systemctl
, successful startup displays something similar to the
following:
sudo systemctl status pacd -l
● pacd.service - PAC BMC sensor monitor
Loaded: loaded (/usr/lib/systemd/system/pacd.service; disabled; vendor preset: disabled)
Active: active (running) since Thu 2018-08-23 09:34:59 PDT; 2s ago
Process: 15694 ExecStart=/usr/local/bin/pacd -d $DefaultGBSOptions -P /usr/local/bin -m $UMask -l $LogFile -p $PIDFile -i $PollInterval -c $CooldownInterval $BoardPCIAddr $ThresholdOptions $ExtraOptions (code=exited, status=0/SUCCESS)
Main PID: 15698 (pacd)
CGroup: /system.slice/pacd.service
└─15698 /usr/local/bin/pacd -d -n /etc/GBSs/default.gbs -P /usr/local/bin -m 0 -l /tmp/pacd.log -p /tmp/pacd.pid -i 0 -c 0
Aug 23 09:34:59 sj-avl-d15-mc.avl systemd[1]: Starting PAC BMC sensor monitor...
Aug 23 09:34:59 sj-avl-d15-mc.avl pacd[15694]: Thu Aug 23 09:34:59 2018: registering default bitstream "/etc/GBSs/default.gbs"
Aug 23 09:34:59 sj-avl-d15-mc.avl pacd[15694]: Thu Aug 23 09:34:59 2018: daemon path is /usr/local/bin
Aug 23 09:34:59 sj-avl-d15-mc.avl pacd[15694]: Thu Aug 23 09:34:59 2018: daemon umask is 0x0
Aug 23 09:34:59 sj-avl-d15-mc.avl pacd[15694]: Thu Aug 23 09:34:59 2018: daemon log file is /tmp/pacd.log
Aug 23 09:34:59 sj-avl-d15-mc.avl pacd[15694]: Thu Aug 23 09:34:59 2018: daemon pid file is /tmp/pacd.pid
Aug 23 09:34:59 sj-avl-d15-mc.avl pacd[15694]: Thu Aug 23 09:34:59 2018: Polling interval set to 0.000000 sec
Aug 23 09:34:59 sj-avl-d15-mc.avl pacd[15694]: Thu Aug 23 09:34:59 2018: Cooldown delay set to 0.000000 sec
Aug 23 09:34:59 sj-avl-d15-mc.avl systemd[1]: Started PAC BMC sensor monitor.
The journalctl
output is similar to the following:
Aug 23 09:34:59 sj-avl-d15-mc.avl systemd[1]: Starting PAC BMC sensor monitor...
-- Subject: Unit pacd.service has begun start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit pacd.service has begun starting up.
Aug 23 09:34:59 sj-avl-d15-mc.avl pacd[15694]: Thu Aug 23 09:34:59 2018: registering NULL bitstream "/etc/GBSs/NULL.gbs"
Aug 23 09:34:59 sj-avl-d15-mc.avl pacd[15694]: Thu Aug 23 09:34:59 2018: daemon path is /usr/local/bin
Aug 23 09:34:59 sj-avl-d15-mc.avl pacd[15694]: Thu Aug 23 09:34:59 2018: daemon umask is 0x0
Aug 23 09:34:59 sj-avl-d15-mc.avl pacd[15694]: Thu Aug 23 09:34:59 2018: daemon log file is /tmp/pacd.log
Aug 23 09:34:59 sj-avl-d15-mc.avl pacd[15694]: Thu Aug 23 09:34:59 2018: daemon pid file is /tmp/pacd.pid
Aug 23 09:34:59 sj-avl-d15-mc.avl pacd[15694]: Thu Aug 23 09:34:59 2018: Polling interval set to 0.000000 sec
Aug 23 09:34:59 sj-avl-d15-mc.avl pacd[15694]: Thu Aug 23 09:34:59 2018: Cooldown delay set to 0.000000 sec
Aug 23 09:34:59 sj-avl-d15-mc.avl systemd[1]: Started PAC BMC sensor monitor.
-- Subject: Unit pacd.service has finished start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit pacd.service has finished starting up.
--
-- The start-up result is done.
OPTIONS¶
-d, --daemon
This argument has been deprecated. The pacd
command treats it as a
no-op.
-P, --directory <dir>
Run from the specified directory (path).
-l, --logfile <file>
Send output to file. If you specify a log file on the command line,
pacd
writes to both that logfile
and stdout
, stderr
, or
syslog
depending on whether ‘pacd’ started from the command line or
a systemd
service.
-p, --pidfile <file>
This argument has been deprecated. The pacd
command treats it as a
no-op.
-m, --umask <mode>
Use the mode value as the file mode creation mask passed to umask.
-i, --poll-interval <secs>
pacd
polls and checks the sensor values every secs
seconds. This
is a real number, meaning you can specify a floating-point number such
as 2.5
for two-and-a-half second poll interval.
-c, --cooldown-interval <secs>
Specifies the time in seconds that pacd
waits after removing the
FPGA driver before re-enabling the driver. The host is not able to
access the PAC for this time period for any reason.
-n, --default-bitstream <file>
Specify the default bitstream to program when a sensor value exceeds the specified threshold. This option may be specified multiple times. The AF, if any, that matches the FPGA’s PR interface ID is programmed when the sensor’s value exceeds the threshold.
-S, --segment <PCIe segment>
Specify the PCIe segment (domain) of the PAC of interest.
-B, --bus <PCIe bus>
Specify the PCIe bus of the PAC of interest.
-D, --device <PCIe device>
Specify the PCIe device of the PAC of interest.
-F, --function <PCIe function>
Specify the PCIe function of the PAC of interest.
-T, --upper-sensor-threshold <sensor>:<trigger_threshold>[:<reset_threshold>]
Specify the threshold value for a sensor that, when exceeded (sensor
value strictly greater than <trigger_threshold>
), causes the default
bitstream specified with -n
that matches the FPGA’s PR Interface ID
to be programmed into the FPGA. The sensor is considered triggered (and
no PR performed) until its value drops below <reset_threshold>
.
You can specify this option multiple times.
The sensors specified are monitored for all specified PACs. There is no mechanism for specifying per-PAC sensor thresholds.
-t, --lower-sensor-threshold <sensor>:<trigger_threshold>[:<reset_threshold>]
Specify the threshold value for a sensor that, when exceeded (sensor
value strictly less than <trigger_threshold>
), causes the default
bitstream specified with -n
that matches the FPGA’s PR Interface ID
to be programmed into the FPGA. The sensor is considered triggered (and
no PR performed) until its value goes above <reset_threshold>
.
You can specify this option multiple times.
The sensors specified are monitored for all specified PACs. There is no mechanism for specifying per-PAC sensor thresholds.
-N, --no-defaults
By default pacd
monitors the same set of sensors that the BMC
monitors. These are sensors that could trigger a machine re-boot. This
set is typically all settable, non-recoverable thresholds. Specifying
this option tells pacd
not to monitor these sensors. This option
requires at least one of -T
or -t
to be specified.
NOTES¶
pacd
is intended to prevent an over-temperature or power
“non-recoverable” event from causing the FPGA’s BMC to shut down the
PAC. Shutting down the PAC results in a PCIe “surprise removal” which
ultimately causes the host to reboot.
The application being accelerated must be able to respond appropriately
when the device driver disappears from the system. The application
receives a SIGHUP
signal when the driver shuts itself down. On
receipt of SIGHUP
, the application should clean up and exit as soon
as possible.
TROUBLESHOOTING¶
If you encounter any issues, you can get debug information examining
stdout
and the system logs, journalctl
or dmesg
.
EXAMPLES¶
The following command starts pacd
as a regular process, programming
idle.gbs
when sensor 11 (FPGA Core TEMP) exceeds 92.35 degrees C or
sensor 0 (Total Input Power) goes out of the range [9.2 - 19.9] Watts.
pacd -n=idle.gbs -T 11:92.35 -T 0:19.9 -t 0:9.2
Revision History¶
Document Version | Changes |
---|---|
2019.05.13 | Made the
following
changes:
Removed the
daemon
argument.
Removed the
driver-re
moval-disab
le
argument.
Removed
description
s
of three
problems
that are
fixed in
the current
release. |
2018.08.17 | Updated to include new options. |
2018.08.08 | Initial revision. |