Fjalar is a framework that facilitates the construction of dynamic analysis tools for programs written in C and C++. This document serves as a guide to building tools on top of the Fjalar framework. This version of the manual describes Fjalar version 1.3.
valgrind-3/valgrind/inst/bin/valgrind --tool=fjalar <command-line args>The actual executable is valgrind because Fjalar is implemented as a Valgrind tool (and your Fjalar tool is compiled together with Fjalar). This command can be fairly tedious to type, so you should probably make a shell script to alias it. The only mandatory command-line argument is the name of the target program (the program to analyze). Use --help as one of your arguments to view a list of command-line options.
In order for Fjalar to work, the target program must be compiled with DWARF2 debugging information (on an x86/Linux system). Look at basic-tool-test.c for a simple target program that exercises Fjalar's function entrance/exit tracking and array bounds checking features. First, compile it:
gcc -gdwarf-2 basic-tool-test.c -o basic-tool-test(The -gdwarf-2 includes debugging information in the DWARF2 format.) Now you should be able to run Fjalar from this directory (assuming that you have successfully compiled and installed it) with the following command:
../../inst/bin/valgrind --tool=fjalar ./basic-tool-testIf all goes well, the tool should print out the name of each function during entrances and exits and the names of all variables visible at that point in execution (as well as array sizes, if relevant).
Here are some tips related to executing Fjalar tools:
if (fjalar_with_gdb) { int x = 0; while (!x) {} }
Selective program point and variable tracing:
Misc. options:
struct foo
f
, what is the value of f
?) However, some
tools require struct variables to be outputted, so we have
included this option.
Debugging:
gdb
to the running process by running gdb on inst/lib/valgrind/x86-linux/fjalar and
using the attach
command (See Debugging Fjalar tools for more details.)
When a Fjalar tool is run on a target program of significant size, often times too many variables are visited, which might cause a performance slowdown or an overload of output. It is often desirable to only trace a specific portion of the target program: program points and variables that are of interest for a particular application. For instance, one may only be interested in tracking changes to a particular global data structure during calls to a specific set of functions (program points), and thus have no need for information about any other program points or variables in the trace file. The --ppt-list-file and --var-list-file options can be used to achieve such selective tracing.
The program point list file (abbreviated as ppt-list-file) consists of a newline-separated list of names of functions that the user wants Fjalar to trace. Every name corresponds to both the entrance and exit program points for that function and is printed out in the exact same format that Fjalar uses for that function. Here is an example of a ppt-list-file:
FunctionNamesTest.cpp.staticFoo(int, int) ..firstFileFunction(int) ..main() second_file.cpp.staticFoo(int, int) ..secondFileFunction()
It is very important to follow this format in the ppt-list-file
because Fjalar performs string comparisons to determine which program
points to trace. Thus, it is often easier to have Fjalar generate a
ppt-list-file file that contains a list of all program points in a
target program by using the --dump-ppt-file option, and then
either comment out (by using the '#'
comment character at the
beginning of the line) or delete lines in that file for program points
not to be traced or create a new ppt-list-file using the names in
the Fjalar-generated file. This prevents typos and the tedium of
manually typing up program point names.
That file represents all the program points that Fjalar would
normally trace. If the user wanted to only trace the main()
function, he could comment out all other lines by placing a single
'#'
character at the beginning of each line to be commented out,
as demonstrated here:
#FunctionNamesTest.cpp.staticFoo(int, int) #..firstFileFunction(int) ..main() #second_file.cpp.staticFoo(int, int) #..secondFileFunction()
When running Fjalar with the --ppt-list-file option using this as the
ppt-list-file, Fjalar only pauses the
execution of the target program at the entrance and exit of
main()
in order to run the tool's code. There is
almost no overhead for all of the other program point executions;
thus, Fjalar performs quite well when tracing selected program
points, even within extremely large target programs.
The variable list file (abbreviated as var-list-file) contains all of the variables
that the user wants Fjalar to trace. There is one section for
global variables and a section for variables associated with each
function (formal parameters and return values). Again, the best
way to create a var-list-file is to have
Fjalar generate a file with all variables included using the --dump-var-file option and then modifying
that file for one's particular needs by either deleting or
commenting out lines (again using the '#'
comment
character). Here is an example var-list-file:
----SECTION---- globals /globalIntArray /globalIntArray[] /anotherGlobalIntArray /anotherGlobalIntArray[] ----SECTION---- FunctionNamesTest.cpp.staticFoo() x y ----SECTION---- ..firstFileFunction(int) blah ----SECTION---- ..main() argc argv argv[] return ----SECTION---- second_file.cpp.staticFoo() x y ----SECTION---- ..secondFileFunction()
The file format is quite straightforward. Each section is marked by a
special string “----SECTION----
” on a line by itself followed
immediately by a line that either denotes the program point name or the
special string “globals
”. This is followed by a
newline-delimited list of all variables to be visited for that
particular program point. (Global variables listed in the
globals
section are visited for all program points.)
For clarity, one or more blank lines should separate neighboring sections,
although the “----SECTION----
” string literal on a line by itself is the only
required delimiter. If an entire section is missing, then no variables
for that program point (or no global variables, if it is the special
globals section) are traced.
In the program that generated the output for the above example,
int* globalIntArray
is a global integer pointer
variable. For that variable, Fjalar generates two variables:
/globalIntArray
to represent the hashcode pointer
value, and /globalIntArray[]
to represent the array of
integers referred-to by that pointer. The latter is a derived
variable that can be thought of as the child of
/globalIntArray
. If the entry for
/globalIntArray
is commented-out or missing, then
Fjalar will not visit any values for /globalIntArray
or for any of its children, which in this case is
/globalIntArray[]
. If a struct or struct pointer
variable is commented-out or missing, then none of its members are
traced. Thus, a general rule about variable entries in the var-list-file is that if a parent variable is
not present, then neither it nor its children are traced.
record record->entries[1] record->entries[1]->list record->entries[1]->list->head record->entries[1]->list->head->magic
For example, if you wanted to trace the value of the magic
field
nested deep within several layers of structs and arrays, it would not be
enough to merely list this variable in the var-list-file. You
would need to list all variables that are the parents of this one, as
indicated by their names. This can be easily accomplished by creating a
file with --dump-var-file and cutting out variable entries,
taking care to not cut out entries that are the parents of entries that
you want to trace.
In order to limit both the number of program points traced as well as the variables traced at those program points, the user can run a Fjalar tool with both the --ppt-list-file and --var-list-file options with the appropriate ppt-list-file and var-list-file, respectively. The var-list-file only needs to contain a section for global variables and sections for all program points to be traced because variable listings for program points not to be traced are irrelevant (their presence in the var-list-file does not affect correctness but does cause an unnecessary performance and memory inefficiency).
If the --dump-var-file option is used in conjunction with the --ppt-list-file option, then the only sections generated in the var-list-file will be the global section and sections for all program points explicitly mentioned in the ppt-list-file. This is helpful for generating a smaller var-list-file for use with an already-existent ppt-list-file.
Fjalar permits users (or external analyses) to specify whether pointers refer to arrays or to single values, and optionally, to specify the type of a pointer (see Pointer type coercion). For example, in
void sum(int* array, int* result) { ... } // definition of "sum" ... int a[40]; int total; ... sum(a, &total); // use of "sum"
the first pointer parameter refers to an array while the second refers to
a single value. Fjalar should treat these values
differently. For instance, *array
is better observed as array[]
,
an array of integers, and result[]
isn't a sensible array
at all, even though in C result[0]
is semantically identical to
*result
.
By default, Fjalar treats all pointers as referencing arrays. For
instance, it would visit result[]
rather than result[0]
and would indicate that the length of array result[]
is always 1.
One can indicate to Fjalar that certain pointers refer to single elements rather than to arrays.
Information about whether each pointer refers to an array or a single element can be specified in a “disambig file” that resides in the same directory as the target program (by default). The --disambig option instructs Fjalar to read this file if it exists. (If it does not exist, Fjalar produces the file automatically and, if invoked along with the --smart-disambig option, heuristically infers whether each pointer variable refers to single or multiple elements. Thus, users can edit this file for use on subsequent runs rather than having to create it from scratch.) The disambig file lists all the program points and user-defined types, and under each, lists certain types of variables along with their custom disambiguation types as shown below. The list of disambiguation options is:
char
and unsigned char
:
char
and unsigned for unsigned char
. (Default)
char
and unsigned char
:
The disambig file that Fjalar creates contains a section for each function, which can be used to disambiguate parameter variables visible at that function's entrance program point and parameter and return value variables visible at that function's exit program point. It also contains a section for every user-defined struct/class, which can be used to disambiguate member variables of that struct/class. Disambiguation information entered here will apply to all instances of a struct/class of that type, at all program points. There is also a section called “globals”, which disambiguates global variables which are output at every program point. The entries in the disambig file may appear in any order, and whole entries or individual variables within a section may be omitted. In this case, Fjalar will retain their default values.
It is possible to use pointer type disambiguation while only tracing selected program points and/or variables in a target program, combining the functionality described in the Pointer type disambiguation and Tracing only part of a program sections. This section describes the interaction of the ppt-list-file, var-list-file, and .disambig files.
The interaction between selective program point tracing (via the ppt-list-file) and pointer type disambiguation is fairly straightforward: If the user creates a .disambig file while running Fjalar with a ppt-list-file that only specifies certain program points, the generated .disambig file will only contain sections for those program points (as well as the global section and sections for each struct type). If the user reads in a .disambig file while running Fjalar with a ppt-list-file, then disambiguation information is applied for all variables at the program points to be traced. This can be much faster and generate a much smaller disambiguation file, one that only contains information about the program points of interest.
The interaction between selective variable tracing (via the
var-list-file) and pointer type
disambiguation is a bit more complicated. This is because the
var-list-file lists variables with munged
Fjalar names, but using a .disambig file can actually change those
Fjalar variable names. For example, in a sample program, the
struct record* bar
parameter of foo()
is
treated like an array by default. Hence, the var-list-file will list the following
variables derived from this parameter:
----SECTION---- ..foo() bar bar[].name bar[].numbers[0] bar[].numbers[0][0] bar[].numbers[1] bar[].numbers[1][0] bar[].numbers[2] bar[].numbers[2][0] bar[].numbers[3] bar[].numbers[3][0] bar[].numbers[4] bar[].numbers[4][0]
However, if we use a disambiguation file to denote
bar
as a pointer to a single element, then the
var-list-file will instead list the following variables:
----SECTION---- ..foo() bar bar->name bar->numbers bar->numbers[]
Notice how the latter variable list is more compact and reflects the
fact that bar
is now a pointer to a single struct. Thus, the
flattening of the numbers[5]
static array member variable is no
longer necessary (it was necessary without disambiguation because Fjalar
does not support nested arrays of arrays, which can occur if bar
were itself an array since numbers[5]
is already an array).
Notice that, with the exception of the base variable bar
, all
other variable names differ when running without and with
disambiguation. Thus, if you used a var-list-file generated on a
run without the disambiguation information while running Fjalar with the
disambiguation information, the names will not match up at all, and you
will not get the proper selective variable tracing behavior.
Thus, this is the suggested way to use selective variable tracing with pointer type disambiguation:
For maximum control of the output, you can use selective program point tracing, variable tracing, and disambiguation together all at once.
In addition to specifying whether a particular pointer
refers to one element or to an array of elements, the user can
also specify what type of data a pointer refers to. This type
coercion acts like an explicit type cast in C, except that it
only works on struct/class types and not on primitive types.
This feature is useful for traversing inside of data
structures with generic void*
pointer fields.
Another use is to cast a pointer from one that refers to a
'super class' to one that refers to a 'sub class'. This
structural equivalence pattern is often found in C programs
that emulate object orientation. To coerce a pointer to a
particular type, simply write the name of the struct type
after the disambiguation letter (e.g., A, P, S, C, I) in the
.disambig file:
----SECTION---- function: ..view_foo_and_bar() f P foo b P bar
Without the type coercion, Fjalar cannot visit anything except for a
hashcode for the two void*
parameters of this function:
void view_foo_and_bar(void* f, void* b);
With type coercion, though, Fjalar treats f
as a foo*
and
b
as bar*
and can traverse inside of them. Of course, if
those are not the true runtime types of the variables, then Fjalar's
traversal will be meaningless.
Due to the use of typedefs, there may be more than one name for a
particular struct type. The exact name that you need to write in the
.disambig file is the one that appears in that file after the
usertype
prefix. Note that if a struct does not have any pointer
fields, then there will be no usertype
section for it in the
.disambig file. In that case, try different names for the struct
if necessary until Fjalar accepts the name (names are all one word long;
you will never have to write struct foo
). There should only be
at most a few choices to make. If the coercion if successful, Fjalar
prints out a message in the following form while it is processing the
.disambig file:
.disambig: Coerced variable f into type 'foo' .disambig: Coerced variable b into type 'bar'
One more caveat about type coercion is that you can currently only coerce pointers into types that at least one variable in the program (e.g., globals, function parameters, struct fields) belongs to. It is not enough to merely declare a struct type in your source code; you must have a variable of that type somewhere in your program. This is a limitation of the current implementation, but it should not matter most of the time because programs rarely have struct declarations with no variables that belong to that type. If you encounter this problem, you can simply create a global variable of a certain type to make type coercion work.