KOJAK Patterns

General Patterns

 

Time

Keywords:
CPU allocation time
Unit:
Seconds
Description:
Time spent on program execution including the idle times of CPUs reserved for slave threads during OpenMP sequential execution. Total assumes that every thread of a process allocated a separate CPU during the entire runtime of the process.
Parent:
None
Children:
Execution, Idle Threads

Execution

Keywords:
Execution time
Unit:
Seconds
Description:
Time spent on program execution but without the idle times of slave threads during OpenMP sequential execution. Note that for pure MPI applications, this pattern is equal to Time.
Parent:
Time
Children:
MPI, OpenMP, SHMEM

MPI Patterns

MPI

Keywords:
MPI
Unit:
Seconds
Description:
This pattern refers to the time spent in MPI calls.
Parent:
Execution
Children:
Communication, IO (MPI), Synchronization (MPI)

Communication

Keywords:
MPI, communication
Unit:
Seconds
Description:
This pattern refers to the time spent in MPI communication calls.
Parent:
MPI
Children:
Collective (MPI), Point-to-Point (MPI) RMA Communication (MPI-2)

Collective (MPI)

Keywords:
MPI, collective communication
Unit:
Seconds
Description:
Time spent on MPI collective communication.
Parent:
Communication (MPI)
Children:
Early Reduce, Late Broadcast (MPI), Wait at N x N (MPI)

Early Reduce

Keywords:
MPI, n-to-1 communication
Unit:
Seconds
Description:
Collective communication operations that send data from all processes to one destination process (i.e., n-to-1) may suffer from waiting times if the destination process enters the operation earlier than its sending counterparts, that is, before any data could have been sent. The pattern refers to the time lost as a result of this situation. It applies to MPI calls MPI_Reduce(), MPI_Gather() and MPI_Gatherv().
Parent:
Collective (MPI)
Children:
None

Late Broadcast (MPI)

Keywords:
MPI, 1-to-n communication
Unit:
Seconds
Description:
Collective communication operations that send data from one source process to all processes (i.e., 1-to-n) may suffer from waiting times if destination processes enter the operation earlier than the source process, that is, before any data could have been sent. The pattern refers to the time lost as a result of this situation. It applies to MPI calls MPI_Bcast(), MPI_Scatter() and MPI_Scatterv().
Parent:
Collective (MPI)
Children:
None

Wait at N x N (MPI)

Keywords:
MPI, n-to-n communication
Unit:
Seconds
Description:
Collective communication operations that send data from all processes to all processes (i.e., n-to-n) exhibit an inherent synchronization among all participants, that is, no process can finish the operation until the last process has started it. This pattern covers the time spent in n-to-n operations until all processes have reached it. It applies to MPI calls MPI_Reduce_scatter(), MPI_Allgather(), MPI_Allgatherv(), MPI_Allreduce(), MPI_Alltoall(), MPI_Alltoallv().
Parent:
Collective (MPI)
Children:
None

Point-to-Point

Keywords:
MPI, point-to-point communication
Unit:
Seconds
Description:
This pattern refers to the time spent in MPI point-to-point communication calls.
Parent:
Communication (MPI)
Children:
Late Receiver, Late Sender

Late Receiver

Keywords:
MPI, delayed sender
Unit:
Seconds
Description:
A send operation is blocked until the corresponding receive operation is called. This can happen for several reasons. Either the MPI implementation is working in synchronous mode by default or the size of the message to be sent exceeds the available MPI-internal buffer space and the operation is blocked until the data is transferred to the receiver. The pattern refers to the time spend waiting as a result of this situation.
Parent:
Point-to-Point
Children:
Messages in Wrong Order (Late Receiver)

Messages in Wrong Order (Late Receiver)

Keywords:
MPI, sending order of messages
Unit:
Seconds
Description:
A Late Receiver situation may be the result of messages that are sent in the wrong order. If a process sends messages to processes that are not ready to receive them, the sender's MPI-internal buffer may overflow so that from then on the process needs to send in synchronous mode causing a Late Receiver situation. This pattern refers to the time spent in a wait state as a result of this situation.
Parent:
Late Receiver
Children:
None

Late Sender

Keywords:
MPI, delayed receiver
Unit:
Seconds
Description:
The time lost waiting caused by a blocking receive operation (e.g, MPI_Recv or MPI_Wait) that is posted earlier than the corresponding send operation.
Parent:
Point-to-Point
Children:
Messages in Wrong Order (Late Sender)

Messages in Wrong Order (Late Sender)

Keywords:
MPI, acceptance order of messages
Unit:
Seconds
Description:
A Late Sender situation may be the result of messages that are received in the wrong order. If a process expects messages from one or more processes in a certain order, although these processes are sending them in a different order, the receiver may need to wait for a message if it tries to receive a message early that has been sent late. The situation can be avoided by receiving messages in the order in which they are sent instead. This pattern refers to the time spent in a wait state as a result of this situation.
Parent:
Late Sender
Children:
None

RMA Communication (MPI-2)

Keywords:
MPI-2, RMA, Remote Memory Access, 1-Sided Communication
Unit:
Seconds
Description:
This pattern refers to the time spent in MPI RMA communication calls. RMA communication calls are MPI_Get(), MPI_Put() and MPI_Accumulate().
Parent:
Communication (MPI)
Children:
Early Transfer

Early Transfer

Keywords:
MPI-2, RMA, Remote Memory Access, 1-sided communication
Unit:
Seconds
Description:
The time lost waiting caused by a blocking RMA transfer operation ( e.g, MPI_Get() or MPI_Put() ) that is posted earlier than the corresponding exposure epoch begins.
Parent:
RMA Communication (MPI)
Children:
None

IO (MPI)

Keywords:
MPI, IO
Unit:
Seconds
Description:
This pattern refers to the time spent in MPI IO calls.
Parent:
MPI
Children:
None

Synchronization (MPI)

Keywords:
MPI, barrier
Unit:
Seconds
Description:
This pattern refers to the time spent in MPI barriers and RMA synchronisation calls.
Parent:
MPI
Children:
Barrier (MPI), RMA Synchronisation, Init/Exit (MPI)

Barrier (MPI)

Keywords:
MPI, synchronization
Unit:
Seconds
Description:
This pattern refers to the time spent in MPI barriers.
Parent:
Synchronization (MPI)
Children:
Barrier Completion (MPI), Wait at Barrier (MPI)

Barrier Completion (MPI)

Keywords:
MPI, synchronization
Unit:
Seconds
Description:
This pattern refers to the time spent in MPI barriers after the first process has left the operation.
Parent:
Synchronization (MPI)
Children:
None

Wait at Barrier (MPI)

Keywords:
MPI, barrier
Unit:
Seconds
Description:
This pattern covers the time spent waiting in front of an MPI barrier, which is the time inside the barrier call until the last processes has reached the barrier. A large amount of waiting time spent in front of barriers can be an indication of load imbalance.
Parent:
Synchronization (MPI)
Children:
None

RMA Synchronization

Keywords:
MPI-2, RMA, Synchronization, Remote Memory Access, 1-Sided Communication
Unit:
Seconds
Description:
This pattern refers to the time spent in MPI RMA synchronization calls. RMA Synchronisation calls are MPI_Win_fence(), MPI_Win_lock(), MPI_Win_unlock(), MPI_Win_post(), MPI_Win_wait(), MPI_Win_test(), MPI_Win_start(), MPI_Win_complete() MPI_Win_create() and MPI_Win_free().
Parent:
Synchronization (MPI)
Children:
Window Management, Fence, General Active Target Synchronization, Passive Target Synchronization (Locks)

Window Management

Keywords:
MPI-2, RMA, Window, Remote Memory Access, 1-Sided Communication
Unit:
Seconds
Description:
This pattern refers to the time spent in collective window construction/destruction calls: MPI_Win_Create() and MPI_Win_free().
Parent:
RMA Synchronization
Children:
Wait at Create, Wait at Free

Wait at Create

Keywords:
MPI-2, RMA, Window, Remote Memory Access, 1-Sided Communication
Unit:
Seconds
Description:
This pattern covers the time spent waiting in front of an MPI_Win_create(), which is the time inside the collective window creation call until the last processes has reached the MPI_Win_create(). A large amount of waiting time spent in front of MPI_Win_create() can be an indication of load imbalance.
Parent:
Window Management
Children:
None

Wait at Free

Keywords:
MPI-2, RMA, Window, Remote Memory Access, 1-Sided Communication
Unit:
Seconds
Description:
This pattern covers the time spent waiting in front of an MPI_Win_free(), which is the time inside the collective window destruction call until the last processes has reached the MPI_Win_free(). A large amount of waiting time spent in front of MPI_Win_free() can be an indication of load imbalance.
Parent:
Window Management
Children:
None

Fence

Keywords:
MPI-2, RMA, Collective Synchronization, Fence, Remote Memory Access, 1-Sided Communication
Unit:
Seconds
Description:
This pattern refers to the time spent in collective RMA synchronization call MPI_Win_fernce().
Parent:
RMA Synchronization
Children:
Wait at Fence

Wait at Fence

Keywords:
MPI-2, RMA, Collective Synchronization, Fence, Remote Memory Access, 1-Sided Communication
Unit:
Seconds
Description:
This pattern covers the time spent waiting in front of an MPI_Win_fence(), which is the time inside the collective synchronization call until the last processes has reached the MPI_Win_fence(). A large amount of waiting time spent in front of MPI_Win_fence() can be an indication of load imbalance.
Parent:
Fence
Children:
None

General Active Target Synchronization

Keywords:
MPI-2, RMA, Synchronization, GATS, Remote Memory Access, 1-Sided Communication
Unit:
Seconds
Description:
This pattern refers to the time spent in general active target synchronization calls. These are MPI_Win_post(), MPI_Win_wait(), MPI_Win_test(), MPI_Win_start() and MPI_Win_complete().
Parent:
RMA Synchronization
Children:
Early Wait, Late Post

Early Wait

Keywords:
MPI-2, RMA, Synchronization, GATS, Remote Memory Access, 1-Sided Communication
Unit:
Seconds
Description:
Time lost in MPI_Win_wait() call, which will block until all matching calls to MPI_Win_Complete() have occurred. Part of lost time can be caused by Late Complete
Parent:
General Active Target Synchronization
Children:
Late Complete

Late Complete

Keywords:
MPI-2, RMA, Synchronization, GATS, Remote Memory Access, 1-Sided Communication
Unit:
Seconds
Description:
The end of exposure epoch marked by a MPI_Win_wait call is delayed as one or more MPI_Win_complete() calls are executed too late. (i.e., not immediately after the last communication call.)
Parent:
Early Wait
Children:
None

Late Post

Keywords:
MPI-2, RMA, Synchronization, GATS, Remote Memory Access, 1-Sided Communication
Unit:
Seconds
Description:
The access to the target window is delayed either by a RMA synchronisation call MPI_Win_Start() or MPI_Win_complete() until the window is exposed.
Parent:
General Active Target Synchronization
Children:
None

Passive Target Synchronization (Locks)

Keywords:
MPI-2, RMA, Synchronization, Locks, Remote Memory Access, 1-Sided Communication
Unit:
Seconds
Description:
This pattern refers to the time spent in MPI_Lock() and MPI_Unlock() function calls.
Parent:
RMA Synchronization
Children:
None

Init/Exit (MPI)

Keywords:
MPI, initialize, finalize
Unit:
Seconds
Description:
This pattern refers to the time spent on MPI initialization calls. It applies to MPI_Init() and MPI_Finalize() calls.
Parent:
Synchronization (MPI)
Children:
None

OpenMP Patterns

OpenMP

Keywords:
OpenMP
Unit:
Seconds
Description:
Time spent on behalf of the OpenMP. This includes time spent in OpenMP API calls as well as time spent in code generated by the OpenMP compiler.
Parent:
Execution
Children:
Flush, Fork, Synchronization (OpenMP)

Flush

Keywords:
OpenMP, flush directive
Unit:
Seconds
Description:
Time spent in OpenMP flush directives.
Parent:
OpenMP
Children:
None

Fork

Keywords:
OpenMP, team creation
Unit:
Seconds
Description:
Time spent by the master thread creating a team of threads.
Parent:
OpenMP
Children:
None

Synchronization (OpenMP)

Keywords:
OpenMP, synchronization
Unit:
Seconds
Description:
Time spent in OpenMP barrier or lock synchronization. Lock synchronization may be accomplished using either API calls or critical sections.
Parent:
OpenMP
Children:
Barrier (OpenMP), Lock Competition (OpenMP)

Barrier (OpenMP)

Keywords:
OpenMP, barrier
Unit:
Seconds
Description:
This pattern refers to the time spent in implicit (compiler-generated) or explicit (user-specified) OpenMP barrier synchronization. Note that during measurement implicit barriers are treated similar to explicit ones. The instrumentation procedure replaces an implicit barrier with an explicit barrier enclosed by the parallel construct. This is done by adding a nowait clause and a barrier directive as the last statement of the parallel construct. In cases where the implicit barrier cannot be removed (i.e., parallel region), the explicit barrier is executed in front of the implicit barrier, which will then be negligible because the team will already be synchronized when reaching it. The synthetic explicit barrier appears in the display as a special implicit barrier construct.
Parent:
OpenMP
Children:
Explicit, Implicit

Explicit

Keywords:
OpenMP, explicit barrier
Unit:
Seconds
Description:
Time spent in explicit (i.e., user-specified) OpenMP barriers.
Parent:
Barrier (OpenMP)
Children:
Wait at Barrier (Explicit)

Wait at Barrier (Explicit)

Keywords:
OpenMP, explicit barrier
Unit:
Seconds
Description:
This pattern covers the time spent waiting in front of an explicit (user-specified) OpenMP barrier. It refers to the time spent in the barrier until all threads have reached it.
Parent:
Explicit
Children:
None

Implicit

Keywords:
OpenMP, implicit barrier
Unit:
Seconds
Description:
Time spent in implicit (i.e., compiler-generated) OpenMP barriers.
Parent:
Barrier (OpenMP)
Children:
Wait at Barrier (Implicit)

Wait at Barrier (Implicit)

Keywords:
OpenMP, implicit barrier
Unit:
Seconds
Description:
This pattern covers the time spent waiting in front of an implicit (compiler-generated) OpenMP barrier. It refers to the time spent in the barrier until all threads have reached it.
Parent:
Implicit
Children:
None

Lock Competition (OpenMP)

Keywords:
OpenMP, lock synchronization
Unit:
Seconds
Description:
This pattern refers to the time a thread spent waiting for a lock that had been previously acquired by another thread. The lock may either had been acquired transparently at the beginning of a critical section or using an explicit API call.
Parent:
Synchronization (OpenMP)
Children:
API Lock Synchronization, Critical

API Lock Synchronization

Keywords:
OpenMP, API lock routines
Unit:
Seconds
Description:
This pattern refers to the time a thread spent in an OpenMP API lock routine waiting for a lock that had been previously acquired by another thread.
Parent:
Synchronization (OpenMP)
Children:
None

Critical

Keywords:
OpenMP, critical section
Unit:
Seconds
Description:
This pattern refers to the time spent waiting in front of a critical section occupied by another thread.
Parent:
Lock Competition (OpenMP)
Children:
None

SHMEM Patterns

SHMEM

Keywords:
SHMEM
Unit:
Seconds
Description:
Time spent in SHMEM API calls.
Parent:
Execution
Children:
Communication (SHMEM), Synchronization (SHMEM)

Communication (SHMEM)

Keywords:
SHMEM, communication
Unit:
Seconds
Description:
This pattern refers to the time spent in SHMEM RMA, collective and atomic communication calls. SHMEM RMA are get and put transfer calls, Collective
Parent:
SHMEM
Children:
Collective(SHMEM), RMA Communication (SHMEM)

Collective (SHMEM)

Keywords:
SHMEM, collective communication
Unit:
Seconds
Description:
Time spent on SHMEM collective communication. It applies to SHMEM calls: shmem_broadcast(), shmem_broadcast_all(), shmem_and(), shmem_max(), shmem_min(), shmem_or(), shmem_prod(), shmem_sum(), shmem_xor(), shmem_collect() and shmem_fcollect().
Parent:
Communication (SHMEM)
Children:
Late Broadcast (SHMEM), Wait at N x N (SHMEM)

Late Broadcast (SHMEM)

Keywords:
SHMEM, 1-to-n communication, Broadcast
Unit:
Seconds
Description:
Collective communication operations that send data from one source process to all processes (i.e., 1-to-n) may suffer from waiting times if destination processes enter the operation earlier than the source process, that is, before any data could have been sent. The pattern refers to the time lost as a result of this situation.
Parent:
Collective (SHMEM)
Children:
None

Wait at N x N (SHMEM)

Keywords:
SHMEM, n-to-n communication
Unit:
Seconds
Description:
Collective communication operations that send data from all processes to all processes (i.e., n-to-n) exhibit an inherent synchronization among all participants, that is, no process can finish the operation until the last process has started it. This pattern covers the time spent in n-to-n operations until all processes have reached it. It applies to SHMEM calls: shmem_and(), shmem_max(), shmem_min(), shmem_or(), shmem_prod(), shmem_sum(), shmem_xor(), shmem_collect() and shmem_fcollect().
Parent:
Collective (SHMEM)
Children:
None

RMA Communication (SHMEM)

Keywords:
SHMEM, RMA, 1-Sided Communication
Unit:
Seconds
Description:
This pattern refers to the time spent in SHMEM RMA communication calls. RMA communication calls are SHMEM get and put transfers and SHMEM atomic operations. Atomic operations are shmem_swap(), shmem_cswap(), shmem_mswap(), shmem_inc(), shmem_finc(), shmem_add() and shmem_fadd() SHMEM calls.
Parent:
Communication (SHMEM)
Children:
None

Synchronization (SHMEM)

Keywords:
SHMEM
Unit:
Seconds
Description:
This pattern refers to the time spent in SHMEM synchronisation calls. This applies to SHMEM barriers, point-to-point synchronisation and management function calls.
Parent:
SHMEM
Children:
Barrier (SHMEM), p2p Synchronisation Init/Exit (SHMEM) Memory Management (SHMEM)

Barrier (SHMEM)

Keywords:
SHMEM, synchronization
Unit:
Seconds
Description:
This pattern refers to the time spent in SHMEM barriers.
Parent:
Synchronization (SHMEM)
Children:
Wait at Barrier (SHMEM)

Wait at Barrier (SHMEM)

Keywords:
SHMEM, barrier
Unit:
Seconds
Description:
This pattern covers the time spent waiting in front of an SHMEM barrier, which is the time inside the barrier call until the last processes has reached the barrier. A large amount of waiting time spent in front of barriers can be an indication of load imbalance.
Parent:
Barrier (SHMEM)
Children:
None

P2P Synchronization

Keywords:
SHMEM, RMA
Unit:
Seconds
Description:
This pattern refers to the time spent in SHMEM point-to-point synchronization calls.
Parent:
Synchronization (SHMEM)
Children:
Lock Completion (SHMEM), Wait Until

Lock Competition(SHMEM)

Keywords:
SHMEM, lock synchronization
Unit:
Seconds
Description:
This pattern refers to the time a PE spent waiting for a lock that had been previously acquired by another PE.
Parent:
P2P Synchronization
Children:
None

Wait Until

Keywords:
SHMEM, wait, wait_until, synchronization
Unit:
Seconds
Description:
This pattern refers to the time spent waiting for a shared variable to be changed by a remote write or atomic swap issued by a different PE. It applies to SHMEM calls shmem_wait(), shem_wait_until()
Parent:
P2P Synchronization
Children:
None

Init/Exit (SHMEM)

Keywords:
SHMEM, initialize, finalize
Unit:
Seconds
Description:
This pattern refers to the time spent on SHMEM initialization calls. It applies to shmem_init() and shmem_finalize() calls.
Parent:
Synchronization (SHMEM)
Children:
None

Memory Management (SHMEM)

Keywords:
SHMEM, memory allocation, realocation, free.
Unit:
Seconds
Description:
This pattern refers to the time spent on SHMEM memory management calls. It applies to shmalloc(), shmalloc_nb, shfree() and shrealloc() calls.
Parent:
Synchronization (SHMEM)
Children:
None

Idle Threads

Keywords:
OpenMP, sequential execution
Unit:
Seconds
Description:
This pattern refers to idle times on CPUs reserved for slave threads when a process is executed sequentially before or after an OpenMP parallel region.
Parent:
Time
Children:
None

Overhead

Keywords:
Trace generation overhead
Unit:
Seconds
Description:
Time spent performing major tasks related to trace generation, such as time synchronization or dumping the trace-buffer contents to a file. Note that the normal per-event overhead is not included.
Parent:
Time
Children:
None

Visits

Keywords:
Function calls
Unit:
Number of visits
Description:
Number of times a certain call path has been visited.
Parent:
None
Children:
None

CPU & Memory Patterns

Processor Cycles Patterns

CYCLES

Keywords:
Hardware counter
Unit:
Number of processor cycles of occurrence
Description:
Total processor cycles
Parent:
None
Children:
BUSY + IDLE + STALL

Instruction Patterns

 

INSTRUCTION

Keywords:
Hardware counter
Unit:
Number of instructions
Description:
Total instructions completed
Parent:
None
Children:
BRANCH + FLOATING_POINT + INTEGER + MEMORY + VECTOR

BRANCH

Keywords:
Hardware counter
Unit:
Number of instructions
Description:
Number of branch instructions
Parent:
INSTRUCTION
Children:
COND_BRANCH + UNCOND_BRANCH

BRANCH_PRED

Keywords:
Hardware counter
Unit:
Number of instructions
Description:
Number of branch instructions which were correctly predicted
Parent:
BRANCH
Children:
None

BRANCH_MISP

Keywords:
Hardware counter
Unit:
Number of instructions
Description:
Number of branch instructions which were mis-predicted
Parent:
BRANCH
Children:
None

FLOATING_POINT

Keywords:
Hardware counter
Unit:
Number of instructions
Description:
Number of floating-point instructions
Parent:
INSTRUCTION
Children:
FP_ADD + FP_MUL + FP_FMA + FP_DIV + FP_INV + FP_SQRT + FP_MISC

FP_ADD

Keywords:
Hardware counter
Unit:
Number of instructions
Description:
Number of floating-point addition instructions
Parent:
FLOATING_POINT
Children:
None

FP_MUL

Keywords:
Hardware counter
Unit:
Number of instructions
Description:
Number of floating-point multiplication instructions
Parent:
FLOATING_POINT
Children:
None

FP_FMA

Keywords:
Hardware counter
Unit:
Number of instructions
Description:
Number of floating-point fused multiply-add instructions
Parent:
FLOATING_POINT
Children:
None

FP_DIV

Keywords:
Hardware counter
Unit:
Number of instructions
Description:
Number of floating-point division instructions
Parent:
FLOATING_POINT
Children:
None

FP_INV

Keywords:
Hardware counter
Unit:
Number of instructions
Description:
Number of floating-point inverse (reciprocal?) instructions
Parent:
FLOATING_POINT
Children:
None

FP_SQRT

Keywords:
Hardware counter
Unit:
Number of instructions
Description:
Number of floating-point square-root instructions
Parent:
FLOATING_POINT
Children:
None

FP_MISC

Keywords:
Hardware counter
Unit:
Number of instructions
Description:
Number of miscellaneous floating-point instructions such as moves and estimates
Parent:
FLOATING_POINT
Children:
None

INTEGER

Keywords:
Hardware counter
Unit:
Number of instructions
Description:
Number of fixed-point (integer) instructions
Parent:
INSTRUCTION
Children:
None

MEMORY

Keywords:
Hardware counter
Unit:
Number of instructions
Description:
Number of memory-referencing instructions
Parent:
INSTRUCTION
Children:
LOAD + STORE + SYNCH

LOAD

Keywords:
Hardware counter
Unit:
Number of instructions
Description:
Number of memory load (read) instructions
Parent:
MEMORY
Children:
None

STORE

Keywords:
Hardware counter
Unit:
Number of instructions
Description:
Number of memory store (write) instructions
Parent:
MEMORY
Children:
None

SYNCH

Keywords:
Hardware counter
Unit:
Number of instructions
Description:
Number of memory synchronization instructions
Parent:
MEMORY
Children:
None

VECTOR

Keywords:
Hardware counter
Unit:
Number of instructions
Description:
Number of vector instructions
Parent:
INSTRUCTION
Children:
None

Data Access Patterns

 

DATA_ACCESS

Keywords:
Hardware counter
Unit:
Number of data accesses
Description:
Total data accesses
Parent:
None
Children:
DATA_HIT_L1$ + DATA_HIT_L2$ + DATA_HIT_L3$ + DATA_HIT_MEM

DATA_HIT_L1$

Synonym:
L1_D_HIT
Keywords:
Hardware counter
Unit:
Number of data accesses
Description:
Total data accesses (stores and loads) which hit in 1st-level cache
Parent:
DATA_ACCESS
Children:
DATA_STORE_INTO_L1$ + DATA_LOAD_FROM_L1$

DATA_STORE_INTO_L1$

Synonyms:
L1_D_READ_HIT
Keywords:
Hardware counter
Unit:
Number of data accesses
Description:
Total data stores (writes) which hit in 1st-level cache
Parent:
DATA_HIT_L1$
Children:
None

DATA_LOAD_FROM_L1$

Synonyms:
L1_D_WRITE_HIT
Keywords:
Hardware counter
Unit:
Number of data accesses
Description:
Total data loads (reads) which hit in 1st-level cache
Parent:
DATA_HIT_L1$
Children:
None

DATA_HIT_L2$

Synonym:
L2_D_HIT
Keywords:
Hardware counter
Unit:
Number of data accesses
Description:
Total data accesses (stores and loads) which miss in 1st-level cache and hit in 2nd-level cache
Parent:
DATA_ACCESS
Children:
DATA_STORE_INTO_L2$ + DATA_LOAD_FROM_L2$

DATA_STORE_INTO_L2$

Synonyms:
L2_D_READ_HIT
Keywords:
Hardware counter
Unit:
Number of data accesses
Description:
Total data stores (writes) which miss in 1st-level cache and hit in 2nd-level cache
Parent:
DATA_HIT_L2$
Children:
None

DATA_LOAD_FROM_L2$

Synonyms:
L2_D_WRITE_HIT
Keywords:
Hardware counter
Unit:
Number of data accesses
Description:
Total data loads (reads) which miss in 1st-level cache and hit in 2nd-level cache
Parent:
DATA_HIT_L2$
Children:
None

DATA_HIT_L3$

Synonym:
L3_D_HIT
Keywords:
Hardware counter
Unit:
Number of data accesses
Description:
Total data accesses (stores and loads) which miss in 1st-level and 2nd-level caches and hit in 3rd-level cache
Parent:
DATA_ACCESS
Children:
DATA_STORE_INTO_L3$ + DATA_LOAD_FROM_L3$

DATA_STORE_INTO_L3$

Synonyms:
L3_D_READ_HIT
Keywords:
Hardware counter
Unit:
Number of data accesses
Description:
Total data stores (writes) which miss in 1st-level and 2nd-level caches and hit in 3rd-level cache
Parent:
DATA_HIT_L3$
Children:
None

DATA_LOAD_FROM_L3$

Synonyms:
L3_D_READ_HIT
Keywords:
Hardware counter
Unit:
Number of data accesses
Description:
Total data loads (reads) which miss in 1st-level and 2nd-level caches and hit in 3rd-level cache
Parent:
DATA_HIT_L3$
Children:
None

DATA_HIT_MEM

Keywords:
Hardware counter
Unit:
Number of data accesses
Description:
Total data accesses (stores and loads) which miss in all caches and must go to memory (system)
Parent:
DATA_ACCESS
Children:
DATA_STORE_INTO_MEM + DATA_LOAD_FROM_MEM

DATA_STORE_INTO_MEM

Keywords:
Hardware counter
Unit:
Number of data accesses
Description:
Total data stores (writes) which miss in all caches and must go to memory (system)
Parent:
DATA_HIT_MEM
Children:
None

DATA_LOAD_FROM_MEM

Keywords:
Hardware counter
Unit:
Number of data accesses
Description:
Total data loads (reads) which miss in all caches and must go to memory (system)
Parent:
DATA_HIT_MEM
Children:
None

Instruction Access Patterns

 

INST_ACCESS

Keywords:
Hardware counter
Unit:
Number of instruction accesses
Description:
Total instruction accesses (fetches)
Parent:
None
Children:
INST_HIT_PREF + INST_HIT_L1$ + INST_HIT_L2$ + INST_HIT_L3$ + INST_HIT_MEM

INST_HIT_PREF

Keywords:
Hardware counter
Unit:
Number of instruction accesses
Description:
Total instruction prefetches
Parent:
INST_ACCESS
Children:
None

INST_HIT_L1$

Synonym:
L1_I_HIT
Keywords:
Hardware counter
Unit:
Number of instruction accesses
Description:
Total instruction accesses (fetches) which hit in 1st-level cache
Parent:
INST_ACCESS
Children:
None

INST_HIT_L2$

Synonym:
L2_I_HIT
Keywords:
Hardware counter
Unit:
Number of instruction accesses
Description:
Total instruction accesses (fetches) which miss in 1st-level cache and hit in 2nd-level cache
Parent:
INST_ACCESS
Children:
None

INST_HIT_L3$

Synonym:
L3_I_HIT
Keywords:
Hardware counter
Unit:
Number of instruction accesses
Description:
Total instruction accesses (fetches) which miss in 1st-level and 2nd-level caches and hit in 3rd-level cache
Parent:
INST_ACCESS
Children:
None

INST_HIT_MEM

Keywords:
Hardware counter
Unit:
Number of instruction accesses
Description:
Total instruction accesses (fetches) which miss in all caches and must go to memory (system)
Parent:
INST_ACCESS
Children:
None

1st-level Cache Patterns

 

L1_ACCESS

Keywords:
Hardware counter
Unit:
Number of accesses
Description:
Total 1st-level cache accesses
Parent:
None
Children:
L1_INST + L1_LOAD + L1_STORE

L1_INST

Keywords:
Hardware counter
Unit:
Number of accesses
Description:
Total 1st-level instruction-cache accesses
Parent:
L1_ACCESS
Children:
L1_INST_HIT + L1_INST_MISS

L1_INST_HIT

Synonym:
L1_I_HIT
Keywords:
Hardware counter
Unit:
Number of accesses
Description:
Total 1st-level instruction-cache hits
Parent:
L1_INST
Children:
None

L1_INST_MISS

Synonym:
L1_I_MISS
Keywords:
Hardware counter
Unit:
Number of accesses
Description:
Total 1st-level instruction-cache misses
Parent:
L1_INST
Children:
None

L1_LOAD

Synonym:
L1_D_READ
Keywords:
Hardware counter
Unit:
Number of accesses
Description:
Total 1st-level data-cache loads (reads)
Parent:
L1_ACCESS
Children:
L1_LOAD_HIT + L1_LOAD_MISS

L1_LOAD_HIT

Keywords:
Hardware counter
Unit:
Number of accesses
Description:
Total 1st-level data-cache load (read) hits
Parent:
L1_LOAD
Children:
None

L1_LOAD_MISS

Synonym:
L1_D_READ_MISS
Keywords:
Hardware counter
Unit:
Number of accesses
Description:
Total 1st-level data-cache load (read) misses
Parent:
L1_LOAD
Children:
None

L1_STORE

Synonym:
L1_D_WRITE
Keywords:
Hardware counter
Unit:
Number of accesses
Description:
Total 1st-level data-cache stores (writes)
Parent:
L1_ACCESS
Children:
L1_STORE_HIT + L1_STORE_MISS

L1_STORE_HIT

Keywords:
Hardware counter
Unit:
Number of accesses
Description:
Total 1st-level data-cache store (write) hits
Parent:
L1_STORE
Children:
None

L1_STORE_MISS

Synonym:
L1_D_WRITE_MISS
Keywords:
Hardware counter
Unit:
Number of accesses
Description:
Total 1st-level data-cache store (write) misses
Parent:
L1_STORE
Children:
None

L1_D_MISS

Keywords:
Hardware counter
Unit:
Number of accesses
Description:
Total 1st-level data-cache misses
Parent:
None (Not currently parented)
Children:
L1_D_READ_MISS + L1_D_WRITE_MISS

2nd-level Cache Patterns

 

L2_ACCESS

Keywords:
Hardware counter
Unit:
Number of accesses
Description:
Total 2nd-level cache accesses
Parent:
None
Children:
L2_HIT + L2_MISS

L2_HIT

Keywords:
Hardware counter
Unit:
Number of access hits
Description:
Total 2nd-level cache hits
Parent:
L2_ACCESS
Children:
L2_INST_HIT + L2_LOAD_HIT + L2_STORE_HIT

L2_INST_HIT

Keywords:
Hardware counter
Unit:
Number of accesses
Description:
Total 2nd-level instruction-cache hits
Parent:
L2_HIT
Children:
None

L2_LOAD_HIT

Keywords:
Hardware counter
Unit:
Number of accesses
Description:
Total 2nd-level data-cache load (read) hits
Parent:
L2_HIT
Children:
None

L2_STORE_HIT

Keywords:
Hardware counter
Unit:
Number of accesses
Description:
Total 2nd-level data-cache store (write) hits
Parent:
L2_HIT
Children:
None

L2_MISS

Keywords:
Hardware counter
Unit:
Number of access misses
Description:
Total 2nd-level cache misses
Parent:
L2_ACCESS
Children:
L2_INST_MISS + L2_LOAD_MISS + L2_STORE_MISS

L2_INST_MISS

Keywords:
Hardware counter
Unit:
Number of accesses
Description:
Total 2nd-level instruction-cache misses
Parent:
L2_MISS
Children:
None

L2_LOAD_MISS

Keywords:
Hardware counter
Unit:
Number of accesses
Description:
Total 2nd-level data-cache load (read) misses
Parent:
L2_MISS
Children:
None

L2_STORE_MISS

Keywords:
Hardware counter
Unit:
Number of accesses
Description:
Total 2nd-level data-cache store (write) misses
Parent:
L2_MISS
Children:
None

TLB Access Patterns

 

TLB_ACCESS

Keywords:
Hardware counter
Unit:
Number of TLB accesses
Description:
Total TLB (Translation Lookaside Buffer) accesses
Parent:
None
Children:
DATA_TLB_ACCESS + INST_TLB_ACCESS

DATA_TLB_ACCESS

Keywords:
Hardware counter
Unit:
Number of Data-TLB accesses
Description:
Total Data-TLB (Translation Lookaside Buffer) accesses
Parent:
TLB_ACCESS
Children:
DATA_TLB_HIT + DATA_TLB_MISS

DATA_TLB_HIT

Keywords:
Hardware counter
Unit:
Number of Data-TLB hits
Description:
Data-TLB (Translation Lookaside Buffer) hits
Parent:
DATA_TLB_ACCESS
Children:
None

DATA_TLB_MISS

Synonym:
TLB_D_MISS
Keywords:
Hardware counter
Unit:
Number of Data-TLB misses
Description:
Data-TLB (Translation Lookaside Buffer) misses
Parent:
DATA_TLB_ACCESS
Children:
None

INST_TLB_ACCESS

Keywords:
Hardware counter
Unit:
Number of Instruction-TLB accesses
Description:
Total Instruction-TLB (Translation Lookaside Buffer) accesses
Parent:
TLB_ACCESS
Children:
INST_TLB_HIT + INST_TLB_MISS

INST_TLB_HIT

Keywords:
Hardware counter
Unit:
Number of Instruction-TLB hits
Description:
Instruction-TLB (Translation Lookaside Buffer) hits
Parent:
INST_TLB_ACCESS
Children:
None

INST_TLB_MISS

Synonym:
TLB_I_MISS
Keywords:
Hardware counter
Unit:
Number of Instruction-TLB misses
Description:
Instruction-TLB (Translation Lookaside Buffer) misses
Parent:
INST_TLB_ACCESS
Children:
None