Specifying SAFECode Type System in Maude

SAFECode is a system for guaranteeing memory safety to programs written in imperative languages like C, C++. This project aims to formalize the SAFECode system as a type extension to the C type system (actually an extension to the LLVM type system similar to the C type system). The language along with the typing rules are specified in a rewriting logic based algebraic specification framework called Maude . Maude specifications are executable giving us a type checker for our type system.

SAFECode Idea



The most novel aspect of SAFECOde is that it guarantees memory safety in the presence of dangling pointers to freed memory. The main idea underlying SAFECode is that it is possible to divide the heap in to seperate regions (or pools) in the memory. (This division is done statically using a context sensitive and flow insensitive pointer analysis called Data Structure Analysis). This division is such that each pool contains only objects of the same C type, i.e. each pool is type homogenous. The key principle that SAFECode uses to guarantee memory safety is the type homogeneity principle : If we allocate an object on top of a previously freed memory object with the same alignment, then dereferencing dangling pointer to freed memory can not cause memory safety violation. This principle allows us to reuse memory, tolerate dangling pointer errors without any overhead. Since the pools identified by our pointer analysis are already type homogenous, we can directly apply the principle and get memory safety. More details about SAFECode are included in the journal publication "Memory Safety Without Garbage Collection for Embedded Applications".

Example



We explain the SAFECode transformations with the help of a C example. The program given below has usage of a dangling pointer to freed memory (This compiled with a normal C compiler would give a core dump).
main() {
   int *x, y, z;
   x = malloc(4);
   *x = 20; 
   y = x ;  //x & y point to the same element
   *y = 30; 
   ....
   z = malloc(4);  
   *z = 50;              
   ...
   free(x); // free the element x 
   ...
   x = malloc(4) ; // allocate another element 
   *y = 40 ; // dangling pointer usage 
   ....

   free(z); 
   free(x);
}

The program after the SAFECode transformation. is given below. The pointer analysis algorithm used by our technique will identify that x and z are not aliased and put them in different memory pools PD1 and PD2. Here each of the two pools is type homogenous and contains objects of type int. So we can use the type homogeneity principle to guarantee memory safety (but get admittedly unexpected result for programs with memory errors).
 
f() {
   int *x, y, z;
   PD1 = poolinit(sizeof(int)); // poolinit initializes a pool PD1
   PD2 = poolinit(sizeof(int)); // this initializes another pool PD2
   x = poolalloc(PD1) ; // poolalloc allocates an element out of the pool PD1
   *x = 20; 
   y = x ; 
   *y = 30; 
   ....
   z = poolalloc(PD2);  // allocates an element out of pool PD2 
   *z = 50;              
   ...
   poolfree(PD1, x); // free the element x 
   ...
   x = poolalloc(PD1) ; // allocate another element PD1
   *y = 40 ; // dangling pointer usage
   ....

   poolfree(PD2, z);
   poolfree(PD1, x);
   pooldestroy(PD1);
   pooldestroy(PD2);
}
  

SAFECode Type System

This project aims to address the question, can we formally prove that SAFECode guarantees memory safety? Towards this end, we design a new language (syntax, typing rules and semantics) to express and analyze programs after SAFECode transformations. With the help of this type system, we aim to formally prove the following results (stated informally) : "If an expression in the program has a type t, it will continue to have type T after every step in the semantics" (preservation theorem) and "The program never gets stuck" (progress theorem). These are called the progress and preservation theorems in the literature. The design and specification of the type system are the goals of this project. The proof of the theorems are still being worked out. With this formal specification and the accompanied proofs of the theorems we get two important benefits.
  1. A formal guarantee that SAFECode system is indeed memory safe.
  2. We don't have to rely on the correctness of the complex pointer analysis for the safety of the program. Before running a program we can simply type-check it in the new type system and if it types, then the program is going to be "memory safe" at run-time.
We now explain the SAFECode type system with the help of the same example given before. The main difference between C and the SAFECode type system is that every pointer in the SAFECode language has a "region variable" attribute that indicates the region the pointer points to. For example, (int@rho1 z) means that z is a pointer to int in region named rho1. This is adapted from the type system of Cyclone . (Cyclone is also a region based type system, but with a major difference, it does not allow for reuse of memory within a region).

Example modified to match the new syntax

main() {
  poolinit(rho1, int) ph1 {  //rho1 is the name of the pool, ph1 is the region handle 
      poolinit(rho2, int) ph2 { //rho2 is the name of the pool, ph2 is the region handle 
      int@rho1 x, y;  //x and y point to region rho1
      int@rho2 z;   // z points to region rho2
      x = poolalloc(ph1, 1); //alocate one element of rho1 
      store x,20;
      ...
      y = x ; 
      *y = 30; 
      ....
      z = poolalloc(ph2);  // allocates an element out of pool rho2
      *z = 50;              
      ...
      poolfree(ph1, x); // free the element x 
      ...
      x = poolalloc(ph1) ; // allocate another element out of rho1
      *y = 40 ; // dangling pointer usage
      ....

      poolfree(Ph2, z); 
      poolfree(Ph1, x);
     }
   }
}
Note that if we had a statement in the above program
   y = z;
This wouldn't type check because the type of y is int@rho1 and type of z is int@rho2 and they are different! This will help us catch any bugs in the implementation of the pointer analysis.

Specification in Maude

We first specified the core LLVM syntax and semantics in LLVM. You can find the specification here . Note it is not complete (see TODO List)
Specification of SAFECode extensions to LLVM. The Syntax and typing rule specfications are done, but the semantics is not yet done. You can find the specification here

ToDo List