pwrkap Home Page

Hi. You have reached the home page for pwrkap 7.20. There will be more details soon.

See download page or git clone git://

pwrkap -- Energy Use Monitor and Power Cap Enforcement Tools version 7.20 (2008-09-01) Written by Darrick J. Wong. (C) Copyright IBM Corp. 2008 This software is covered under the GNU GPL v2; see COPYING for details. Overview -------- This document attempts to describe the structure of the the pwrkap software. There are two big parts to this program--device and power meter enumeration and the grouping of those devices into power domains; and the part that builds a table describing the effects of device power-state changes on the power consumption of that power domain. Power Meters ------------ All power meter objects must implement the methods described in the power_meter class in For a reference implementation, see Obviously, power meter driver implementations will vary. Managed Devices --------------- All devices that are to be managed by pwrkap must implement the methods of the device class in For a reference implementation, see the file Obviously, device driver code will vary. Discovery --------- Code module, power meter, device and power domain discovery are all handled through Discovery for pwrkap is done in a much different way than it has been done in the past. All code modules to be loaded should be listed in PCAP_DRIVERS; every module listed in that array is imported when the program starts. Each code module should have a non-indented code snippet that adds discovery functions to the lists PCAP_DEVICE_DISCOVERY, PCAP_METER_DISCOVERY, or PCAP_POWER_DOMAIN_DISCOVERY. When discovery is done, all functions registered in those lists will be called in succession. First devices are discovered, then power meters, and finally power domains are created to map power meters to the devices that the meter measures. Device discovery has two parts. First, the devices should be enumerated and put into PCAP_DEVICES. Second, device control domain information must be discovered. This means that all devices with their own set of controls should be put into a device_domain, one domain per device. For sets of devices that share the same controls, all the devices should be put into one device_domain. Finally, all device_domain objects should be put into the PCAP_DOMAIN_DEVICES list. Power meters are discovered and should be put into PCAP_POWER_METERS. Energy meters are discovered and should be put into PCAP_ENERGY_METERS. Finally, meters and device control domains are united in the power domain discovery function. Each power domain is created with links to three data structures--the list of device_domain objects that are bound to the power meter, something called an identical device profile domain, and the power meter itself. The fourth argument is the initial power cap for the domain. All power domains should be put in PCAP_POWER_DOMAINS; all devices, device control domains, and meters claimed by the power domain should be removed from PCAP_DEVICES, PCAP_DEVICE_DOMAINS, PCAP_ENERGY_METERS, and PCAP_POWER_METERS, respectively. The identical device profile domain, or idomain, identifies devices that have identical power use profiles. This enables some extra flexibility in both the training program as well as the cap enforcement algorithm. Instead of having to iterate through all power states of all devices in a domain for training, the program uses the idomain data to test one device in the idomain and apply its observations to all other devices in the idomain. This drastically reduces training time as well as shrinking the size of the power use effects table. Please also note that even a device with a unique profile needs to be placed in an idomain by itself. The power domain discovery code are critical to correct operation of pwrkap! Currently, there are two discovery paths--one for certain IBM systems, and a simple one that lumps everything it finds into one power domain provided there is only one power meter. Systems with multiple domains will need to provide their own power domain discovery logic. Inventories and Snapshots ------------------------- Nearly all the data objects involved with pwrkap have two methods that have not yet been discussed--inventory() and snapshot(). These two functions are described below. However, a discussion of call flow may be useful. Generically, the inventory() function returns a description of the static characteristics of the system--a hardware identifier of the power meter, the list of supported CPU frequency states, etc. It is assumed that changes in inventory do not happen within the life of the daemon; thus, a change of this sort should result in the daemon discarding all training data and starting anew. Typically this data are used to estabish machine capabilities. The snapshot() function, then, returns a picture of dynamic system state at the time of invocation. These items should be fairly volatile and are used to compute the current state of the machine and where it should go next. pwrkap's controller object (discussed later) will call the inventory() or snapshot() methods of power domain objects; it is the duty of these objects to make the appropriate calls to the power meter and the control domain objects; the control domain objects will (eventually) call the methods in the device drivers. After that, the power domain object will compile the subordinate objects' inventory or snapshot data into a single report and return it. Relating Power Use to Device Utilization and Power States --------------------------------------------------------- The heart of pwrkap is a four-dimensional table that enables pwrkap to guess what kind of impact a change in a device's power state will have on the power domain's power use. This large-ish array is indexed like this: transition_table[idomain][dev_use][p_state][p_state] = power_change The idomain field describes an identical device profile domain as outlined above. The dev_use index cuts the rest of the table into utilization buckets, because changes between power states of certain devices (CPUs in particular) have different effects on power use depending on the device's utilization. The last two fields are used to index the current power state of the device and a proposed new power state. The value stored in the table is the average impact on power use given the four indexing factors. This table can contain empty cells. By convention, the two p-state indices are always ordered with the lower of the two coming first, as it is assumed that transitions are commutative, i.e. a->b => c and b->a => -c. Training -------- In the event that pwrkap finds itself lacking data relating power use to device utilization and power state, a training algorithm is needed to observe the necessary data to enforce the cap effectively. The code to do this can be found in The training algorithm is quite simple: 1. Load down the system so that the "100% Utilization" buckets are filled. This probably requires outside attention. 2. Set all devices to their lowest power state. This is done partly to avoid overloading the power mains and partly to work around cpufreq bugs in Linux. 3. In each power domain, identify one device from each idomain. 4. For all possible combinations of device and power state, take a snapshot of the power domain and process the snapshot. See Processing a Snapshot below for details. When the daemon is offline, the power domains and their transition tables are written to disk via python's cPickle mechanism. When the program loads, the inventory of the saved data is compared to what is discovered; if there is a mismatch, the saved data are discarded and the training algorithm is started. Processing a Snapshot --------------------- When snapshots are taken, either during training or during normal operation of the pwrkap daemon, it is useful to augment the transition database with the data that are being collected, thereby enabling the software to adapt to an environment that changes over time. Each power domain remembers the past few snapshots that were taken of that domain. When a new snapshot is taken, it is compared to every snapshot currently in the retention buffer. If it can be shown that the only difference between the old and new snapshots is a change in power state one device in an idomain, the differences in states and in power use are noted in the transition table. Enforcing Caps -------------- Each power domain gets its own thread to run a control loop. The operation of this control loop is as follows: 1. No more than once every MEASUREMENT_PERIOD seconds, take a snapshot of the power domain. 2. If less than ENFORCEMENT_INTERVAL seconds have passed since the last enforcement attempt, go to sleep and restart at step 1. 3. Compute the difference between the domain's power cap and power use. 4. If use is less than cap, find all transitions that increase performance. If use is more than cap, find all transitions that decrease use. if they are equal, go back to step 1. 5. Find the transition with the most positive performance increase for the most negative increase in power use. 6. Implement the power state change specified by the transition and return to step 3. Talking to Clients ------------------ Communication with the pwrkap client is achieved through three components-- controller, sockmux and lazy_log. The controller accepts incoming commands from sockmux and modifies the power domains as necessary. The lazy log receives new snapshots from the domains as they run through their control loops. While forwarding new log entries to the sockmux, the lazy log also retains the last few log entries, which are sent to new clients. Finally, the sockmux dispatches incoming commands to the controller, sends data from the lazy log to all clients, and deals with the underlying socket plumbing. Communications across sockets are done with Python objects in cPickle format. When a client first connects to a pwrkap server, it is sent a list of power domain inventories and the log entries retained in the lazy log. The Powercap Client ------------------- The pwrkap client connects to the server and receives a list of power domain inventories and recent log entries. After that, the client should listen for incoming power domain snapshots; it may also send commands to the pwrkap daemon. The CLI client dumps all incoming data to the console in raw format. Anything typed into it is cPickle'd and send to the pwrkap server. The GTK client is a bit more sophisticated. In addition to being able to watch multiple pwrkap daemons simultaneously, it employs GTK controls to switch the view between power domains (or aggregates of power domains). It also interfaces with matplotlib to draw a graph of the past few minutes' of power cap, power use, and overall domain utilization. Most of the client code is fairly straightforward and not worth mentioning here, with the exception of the domain aggregator. Domain Aggregator ----------------- The domain aggregator (found in is used by the GTK client to roll the power cap, use and domain utilization data of many domains into a single report. It can also be used to set the power cap of a large number of machines; in that case, the power cap of each machine is scaled up or down to maintain the same proportion between domain power cap and overall power cap. To aggregate data for graphing, the historical data of several power domains must be blended together. Individual samples are taken in timestamp order from each domain; the graphing interval is then split into small ranges of time. For each time range, the power use and cap of each domain are added and the utilizations are averaged to create the aggregate's profile for that time stamp. This much smaller set of data is then graphed. Wire Protocols -------------- The wire protocol in use between the server and client software is a simple one; both encode python objects in pickle format and send that across the wire. The handshake protocol looks approximately like this: 1. Management client connects to pwrkap daemon. 2. pwrkap daemon sends a list of tuples of the format (domain name, inventory). 3. pwrkap daemon replays the last few log entries one by one. Log entries are of the format (UTC timestamp, (domain name, snapshot)) 4. pwrkap daemon sends (UTC timestamp, "live") to indicate that all data after this point are live. The server's UTC timestamp can be used to synchronize with the client's clock, since all log entries have UTC timestamps. 5. pwrkap daemon sends live snapshots of the form (UTC timestamp, (domain name, snapshot)) 6. Client software can send commands as an array of strings at any time.

Last updated: 12 January 2009.