VLab Projects
Current Projects
Project Title: Integrated Symbolic execution for Space-Time Analysis of Code (ISSTAC)
Summary
Cybersecurity hinges upon finding vulnerabilities in software systems
before they are deployed in an environment open to malicious actors. As the
implementation flaws in software systems are eliminated by increasingly
sophisticated software analysis techniques, attacks relying on the inherent
space-time complexity of algorithms used for building software systems are
gaining prominence. When an adversary can inexpensively generate inputs
that induce behaviors with expensive space-time resource utilization at
the defender's end, in addition to mounting denial-of-service attacks,
the adversary can also use the same inputs to facilitate side-channel
attacks in order to infer some secret from the observed system behavior.
Our objective in this project is to develop automated analysis techniques and implement them in an industrial-strength tool that allows the efficient analysis of software (in the form of Java bytecode) with respect to these problems rapidly enough for inclusion in a state-of-the-art development process. The analysis will result in (1) estimates for the worst-case space-time complexity and algorithms, (2) input constraints that implicitly characterize inputs to the algorithms that trigger the worst-case behavior, (3) concrete test examples for ``bad'' inputs, and (4) estimates for the information leakage from the system through observables related to time and memory resource utilization.
Our goal is to perform the analysis using a radically improved form of symbolic execution of Java bytecode that will yield the results described above. Symbolic execution is a very powerful code analysis technique that not only implicitly enumerates all program execution paths, but is also capable of constructing the inputs that trigger those paths. To address the scalability, speed, and accuracy issues we will (1) apply new promising techniques to worst-case and side-channel analysis, (2) develop a novel and efficient model-counting constraint solver that will be used in constructing the malicious inputs and computing the leakage, and (3) implement a cloud-based platform for distributed symbolic execution where multiple program execution paths are explored in parallel.
Today such worst-case and side-channel analysis is done on the level of individual algorithms either by hand or with minimal automation, but on the level of systems it is mainly done by code reviews and/or testing. Clearly, this is not scalable due to the complexity of the systems involved. If our project succeeds, developers and customers (like government agencies) will be able to use the tools we develop to find out the worst-case behavior of their code, including input examples that trigger that behavior. Estimates for side-channel capacity will also be available for making informed acceptance decisions. Finally, the analysis will be done rapidly enough for inclusion in a nightly build process.
Funding
This project is funded by DARPA under agreement number FA8750-15-2-0087
(PI: Tevfik Bultan,
Period: 2015-2019,
Amount: $6,000,000, $1,400,000 to UCSB; joint with
Vanderbilt University, PI: Gabor Karsai; CMU, PI: Corina Pasareanu),
under the Space/Time Analysis for Cybersecurity Program.
Project Title: Leveraging Graph Databases for Incremental and Scalable Symbolic Analysis and Verification of Web Applications
Summary
Modern human society relies heavily on web applications and is deeply affected by their poor
dependability. Unfortunately, no matter how much effort we put into verification and validation
of software, existing techniques are inherently limited, and software is routinely released
with bugs and issues that limit its functionality and can dramatically affect the user experience.
This situation is only exacerbated by the characteristics of today’s systems, which are increasingly
dynamic, heterogeneous, distributed, configurable, and ultimately complex. This is especially
the case for web applications, which are increasingly popular and often handle confidential
and sensitive data.
Eliminating the dependability problems that plague today’s web applications requires revolutionary changes to current software verification and validation practices. In this project we propose a high risk-high payoff approach that has the potential to significantly improve the scalability and effectiveness of symbolic program analysis and verification techniques for web applications. By integrating program analysis and verification techniques with advanced graph database technology we plan to achieve efficient storage and retrieval of intermediate and prior analysis results, enabling incremental and differential analysis and verification strategies. Our research will bring the power of big data processing to program analysis and verification, and we believe that the unique combination of ideas we propose that build on techniques from diverse research areas can have a big impact on improving software dependability.
> In particular, in this project we plan to develop techniques and tools that will (1) use symbolic execution and automata-based verification techniques to automatically analyze and verify web applications, (2) store results of symbolic analyses in a graph database for efficient storage, and retrieval, (3) use incremental and differential analysis strategies that utilize the graph database in order to improve scalability and effectiveness of software analysis and verification
Funding
This project is funded by the NSF grant CCF-1548848
(PI: Tevfik Bultan, Co-PI Xifeng Yan
Period: 2015-2017,
Amount: $200,000, $100,000 to UCSB; joint with CMU, PI: Corina Pasareanuru; Georgia Tech, PI: Alex Orso)
provided by the Directorate
for Computer and Information Science and Engineering (CISE),
Division of Computer and Communication Foundations (CCF),
Software and Hardware Foundations
Program.
Project Title: Data Model Verification for Web Applications
Summary
Nowadays most of our work, communication, and even entertainment, involves
using software applications. Increasingly, software applications are
web-based and not bound to the computing devices we use, be it a desktop,
laptop, tablet, or smartphone. These modern software applications store
the data in remote data centers using cloud computing platforms. In this
modern computing landscape, software becomes a service that is accessible
from any device, anywhere, anytime. This saves the users from hassles
of local data storage, software installation, configuration management,
upgrades and security patches, all the things one has to deal with when
using traditional desktop software
However, these benefits come with a cost: the increasing complexity of software applications. Modern software applications are distributed systems that consist of multiple components that run in parallel on multiple machines in the cloud, and interact with each other in complex ways via the Internet. As one would expect, developing such software systems is an extremely difficult and error-prone task. Most of the time the developers are not able to find all the errors in a software application before it is released to the users. So, bugs in modern software systems are common, leading to unintended behaviors, security vulnerabilities, and crashes. The fact that existing software development practices are not able to produce dependable web applications was recently demonstrated by the well publicized problems of the HealthCare.gov website.
Web applications use the three-tier architecture that consists of a client, a server and a back-end datastore. The client-side code is responsible for coordinating the interaction with the user, the server-side code implements the business logic and determines the control flow of the application, and the back-end datastore keeps the persistent data. The fundamental building block of a web application is the data model that specifies the types of objects (e.g., user, account) and the relations among the objects (e.g., the association between the users and accounts) stored by the application, the constraints on the relations (e.g., each account must be associated with a user), and the actions that update the data by sending queries to the back-end datastore based on the user input. The data model of a web application is typically specified using an object-relational mapping (ORM) that maps the object-oriented code at the server side to the relational database at the back-end.
Funding
This project is funded by the NSF grant CCF-1423623
(PI: Tevfik Bultan,
Period: 2014-2017,
Amount: $500,000)
provided by the Directorate
for Computer and Information Science and Engineering (CISE),
Division of Computer and Communication Foundations (CCF),
Software and Hardware Foundations
Program.
Past Projects
Project Title: Viewpoints: Discovering Client- and Server-side Input Validation Inconsistencies to Improve Web Application Security
Summary
Nowadays web applications are part of every aspect of the society,
from entertainment to business to social interaction. Hence, security
of web applications is an extremely important and urgent problem.
Since web applications are easily accessible, and often store a large
amount of sensitive user information, they are a typical target for
attackers. In particular, attacks that target input validation
vulnerabilities are extremely common and effective. These attacks
exploit erroneous or insufficient validation and sanitization of the
user inputs to inject malicious data that can result in execution of
harmful commands and access to sensitive information. Some of these
attacks exploit well-known vulnerabilities, such as SQL injection and
cross-site scripting, whereas some others exploit application specific
vulnerabilities that are hard to identify because they depend on the
specific input validation logic of the target application.
The goal of this proposal is to identify and mitigate
vulnerabilities in web applications by performing automatic checking
of input validation and sanitization. The key insight for this
work comes from the observation that developers often introduce
redundant checks in both the front-end (client) and the back-end
(server) component of a web application. Client-side checks are fast
and can improve performance and responsiveness of the application, but
can be easily circumvented; Server-side check are hard to circumvent,
but require network round-trips and additional server-side processing.
Our intuition is that the checks performed at the client and server
sides should be identical, that is, they should enforce the same set
of constraints on the inputs. If client-side checks are more
restrictive, the server (application) may accept inputs that
legitimate clients can never produce, which is problematic, as
malicious users can easily bypass client-side checks. On the other
hand, if server-side checks are more restrictive, the client may
produce requests that are subsequently rejected by the server, which
is not ideal from a performance point of view. Based on this
intuition, we propose to compare the set of checks performed on one
side to the set of checks performed on the other side, identify and
report inconsistencies between the two sets of checks, and try to
automatically extend the checks to eliminate the inconsistencies.
Funding
This project is funded by the NSF collaborative grant CNS 1116967
(PI: Tevfik Bultan, Co-PI: Chris Kruegel,
Period: 2011-2013,
Amount: $500,000, $300,000 to UCSB; joint with Georgia Tech, PI: Alex Orso)
provided by the Directorate
for Computer and Information Science and Engineering (CISE),
Division of Computer and Network Systems (CNS),
Trustworthy Computing Program.
Project Title: Formal Analysis of Distributed Interactions
Summary
Software systems are becoming increasingly more concurrent and distributed.
In fact, nowadays, many software systems consist of multiple components
that execute concurrently, possibly on different machines. Moreover, new
trends in computing, such as service-oriented architecture, cloud computing,
multi-core hardware, all point to even more concurrency and distribution
among the components of software systems in the future. At the same time,
concurrent and distributed software systems are increasingly used in every
aspect of society and in some cases provide safety critical services. Hence,
it extremely important and urgent that computer scientists develop techniques
that guarantee that these software systems behave as they are expected.
A crucial problem in dependability of concurrent and distributed
software systems is the coordination of different components that form
the whole system. In order to complete a task, components of a software
system have to coordinate their executions by interacting with each other.
Message-based communication is an increasingly common interaction mechanism
used in concurrent and distributed systems where components interact with
each other by sending and receiving messages. The goal of this project is
to develop novel techniques for specification, analysis, and verification
of message-based interactions.
Funding
This project is funded by the NSF collaborative grant CCF 1117708
(PI: Tevfik Bultan, Co-PI: Oscar Ibarra,
Period: 2011-2014,
Amount: $494,000, $329,000 to UCSB; joiny with Iowa State, PI: Samik Basu)
provided by the Directorate
for Computer and Information Science and Engineering (CISE),
Division of Computer and Communication Foundations (CCF),
Software and Hardware Foundations
Program.
Project Title: Automata Based String Analysis for Detecting Vulnerabilities in Web Applications
Summary
Web applications contain numerous vulnerabilities that can be exploited by attackers to gain unauthorized
access to confidential information and to manipulate sensitive data. Certain input validation vulnerabilities
can be automatically identified by means of static code analysis. In particular, data flow analysis can be used
to track the flow of potentially malicious data from a source (where user-input is read) to a sink (a sensitive
program operation).
Traditional data flow analysis considers only binary taint qualifiers, where standard sanitization functions
are assumed to convert a tainted (bad) value to an untainted (safe) value. However, such an approach
misses certain SQL injection vulnerabilities and cannot identify incorrect sanitization routines. To cover a
broader range of security flaws, researchers have proposed to extend data flow analysis with string analysis,
a technique that captures the string values that a certain variable might hold at a particular program point.
Unfortunately, existing string analysis techniques often make conservative approximations that lead to unnecessarily
imprecise results. This makes static vulnerability detection tools report many false positives.
In this project, we propose to develop novel techniques for more precise string analysis. To this end,
we will use an automata-based approach that represents possible values of a string variable at a program
point as a deterministic finite automaton (DFA). Using DFAs, we will develop techniques that support pathsensitivity.
To enable precise analysis of loops, we will develop automata-based widening operations. Furthermore,
we will extend the basic string analysis techniques to a composite analysis where relationships
among string variables and other types of variables can be automatically discovered and analyzed. Using
our novel string analysis techniques, false positives due to overly conservative approximations will be
significantly reduced.
In addition to reducing false positives, we also propose to leverage the results of our precise string analysis
to develop novel security solutions and to detect interesting, new classes of vulnerabilities. In particular,
we plan to use string analysis to detect malicious file inclusion vulnerabilities. To identify this important
type of security problem, it is necessary to possess precise information on the arguments of operations that
dynamically load code. Moreover, we propose two novel program capability analyses that use precise string
information to gather information about how an application interacts with a back-end database and the file
system. This allows us to restrict an application’s database access privileges to a required minimum, and it
supports the prevention of directory traversal attacks.
Funding
This project is funded by the NSF grant CCF 0916112
(PI: Tevfik Bultan, Co-PI: Chris Kruegel,
Period: 2009-2012,
Amount: $350,000)
provided by the Directorate
for Computer and Information Science and Engineering (CISE),
Division of Computing and Communication Foundations (CCF),
Trusted Computing Program.
Project Title: Automated Verification of Web Application Data Models
Summary
Although the web began as a medium for sharing information stored in static
HTML pages, it has evolved into a ubiquitous medium for interaction and
dynamism, driven by sophisticated web applications that can provide highly
complex functionality that was once only available in desktop applications.
There is a large stumbling block to this ever increasing reliance on web
applications: Web applications are not dependable.
The goal of this project is to improve the dependability of
web applications by developing automated techniques for extraction of data
models, and for inference, verification and enforcement of their properties.
Funding
This project is funded by a grant (PI: Tevfik Bultan,
Period: 2012-2013, Amount: $5,683.00)
from the Committee on Research, University
of California, Santa Barbara.
Project Title: Modeling and Analyzing Trust in Service-Oriented Architectures
Summary
Service-oriented architectures (SOAs) enable the dynamic integration of
services implemented on heterogeneous computer systems. An advantage of SOAs
is that they provide high-level, semantically-rich services by abstracting the
low-level details of their implementation. The availability of useful
services such as Google's and Amazon's web service APIs has fostered a number
of applications that integrate a number of services to provide novel, more
complex functionality.
Unfortunately, as is the case with most software, the security of SOAs has
been a secondary concern. Most of the security efforts in the SOA community
have been focusing on developing web services standards, such as WS-Security,
XACML, and SAML. The problem with these standards is that they only specify an
acceptable level of practice, not the best practice, and this acceptable level
is only achieved if the standard is properly applied. However, experience
shows that both web services and service-based applications are often
developed under strict time constraints by programmers with little security
training. As a result, vulnerable services and service-based applications are
deployed and made available to the whole Internet, creating easily-exploitable
entry points for the compromise of entire networks.
As the service-oriented model is increasingly adopted, there is a need for
rigorous approaches to model trust in SOAs and ensure trust in the composition
of services, in the form of service-based applications. These approaches
should guarantee that by composing services that satisfy certain trust
properties one will obtain an application that satisfies the desired trust
properties.
The goal of this project is to develop novel tools and
techniques for the modeling and analysis of trust properties of SOA-based
applications. More precisely, the proposed research is to develop a
framework that leverages formal models of trust, automated, and interactive
verification techniques, and program analysis techniques to address trust
properties of single services and of their compositions, at both the interface
and implementation levels.
Funding
This project is funded by the NSF grant CNS-0716095
(PI: Giovanni Vigna, Co-PIs: Tevfik Bultan, Richard Kemmerer,
Period: 2007-2010,
Amount: $850,000)
provided by the Directorate
for Computer and Information Science and Engineering (CISE),
Division of Compututer and Network Systems (CNS),
Cyber Trust Program.
Project Title: Automated Verification of the Native Client
Summary
Native Client (NaCl) is a sandbox for untrusted x86 native code. The NaCl framework enables execution
of x86 native code in web applications in order to achieve computational performance of native applications
without compromising safety. The NaCl validator checks a set of constraints on NACl binaries before executing
them to guarantee their safety. The goal of this project is to use automated verification techniques to analyze
the NaCl implementation in order to eliminate bugs, and to improve the confidence in the safety guarantees
provided by the NaCl framework.
Funding
This project is funded by a Google Research Award (PIs: Tevfik Bultan and Christopher Kruegel,
Period: 2010-2011, Amount: $50,000).
Project Title: Design for Verification
Summary
Developing dependable software is one of the most important
challenges in computer science and a scientific approach to design
has to address this problem.
There has been significant progress in automated verification techniques
in recent years, however, scalable software verification remains out of reach.
A design for verification approach, which enables software developers to
document the design decisions that can be useful during verification, can
improve scalability and applicability of automated verification techniques
significantly.
The approach proposed in this project is to use behavioral
interface specifications that support abstraction and modular verification
to achieve scalable verification. These specifications will be embedded
into the code using design patterns which are geared
towards specific classes of verification problems and domain specific
verification techniques. Each design pattern will identify the
design information necessary to achieve scalable verification in a
particular application domain and provide helper classes that will be
used to record the required design information in the code. This
design information will be used to identify the module boundaries and
construct compact models which abstract parts of the code that do
not relate to the properties that are being verified.
Initial results on application of this approach to verification
of synchronization operations in concurrent programs and to
verification of interactions among distributed components led to
very promising results.
These preliminary results suggest that it should be possible to
develop a general design for verification
framework based on these principles.
Funding
This project is funded by the NSF grant CCF-0614002
(PI: Tevfik Bultan,
Period: 2006-2008,
Amount: $200,000)
provided by the Directorate
for Computer and Information Science and Engineering (CISE),
Division of Computing and Communications Foundations (CCF),
Science of Design Program.
Project Title: Design for Verification: A New Approach for Developing Highly Dependable Software
Summary
This project will investigate two aspects of design for verification.
The first one is enabling
software designers to structure the software systems in ways that
make them easier to verify. This requires identifying
the design principles for developing verifiable systems.
The second aspect of design for verification is
to elicit extra information from
the software designers that would be helpful during automated verification.
This requires identifying mechanisms for passing this information
obtained during the design phase to the verification phase.
Funding
This project is funded by a grant (PI: Tevfik Bultan,
Period: 2006-2007, Amount: $7,000)
from the Committee on Research, University
of California, Santa Barbara.
Project Title: Reliable Concurrent Software Development via Reliable Concurrency Controllers
Summary
Run-time errors in concurrent programs are generally due to wrong usage of
synchronization primitives such as monitors. Conventional validation
techniques such as testing become ineffective for concurrent programs
since the state space increases exponentially with the number of
concurrent processes. In recent years there has been considerable progress
in automated verification techniques for concurrent systems based on model
checking. This project presents a framework for reliable concurrent
programming based on interface-based specification and verification of
concurrency controllers. Proposed specification and verification
techniques for concurrency controllers are modularized by decoupling
behavior and interface specifications. The behavior specification is a set
of actions (which correspond to methods or procedures) composed of guarded
commands. The interface specification is a finite state machine whose
transitions represent actions. Concurrency controllers can be designed
modularly by composing their interfaces. The proposed approach separates
the verification of the concurrency controllers (behavior verification)
from the verification of the threads which use them (interface
verification). For behavior verification it is possible to use symbolic
and infinite-state verification techniques which enables verification of
controllers with parameterized constants, unbounded variables and
arbitrary number of client threads. For interface verification it is
possible to use explicit state program verification techniques which
enables verification of arbitrary thread implementations without any
restrictions. The correctness of the user threads can be verified using
stubs generated from the concurrency controller interfaces which improves
the efficiency of the thread verification significantly. It is possible to
synthesize efficient Java monitors from the concurrency controller
specifications, and the generated implementations preserve the verified
properties of the specifications. The proposed framework will be
implemented as a set of tools for concurrency controller specification,
verification and synthesis.
Funding
This project is funded by the NSF grant CCF-0341365
(PI: Tevfik Bultan,
Period: 2003-2007,
Amount: $336,000)
provided by the Directorate
for Computer and Information Science and Engineering (CISE),
Division of Computing and Communications Foundations (CCF),
Trusted Computing Program.
Project Title: Developing Reliable Concurrency-Controllers
Summary
In this project, we use automated verification techniques for developing
reliable concurrency-controllers. Concurrency-controllers are
software components which are
used solely for concurrency control.
We propose a modular approach to development of concurrency-controllers
which: 1) decouples the behavior and the interface specifications
of concurrency-controllers and
2) uses both symbolic and explicit state
automated verification techniques by exploiting this
separation.
Funding
This project is funded by a grant (PI: Tevfik Bultan,
Period: 2003-2004, Amount: $8,913)
from the Committee on Research, University
of California, Santa Barbara.
Project Title: CAREER: Verifiable Specifications: Tools for Reliable Reactive Software Development
Summary
Developing reliable software for reactive systems is a
challenging task. This project focuses on combining
and extending the results from two areas which address
this problem: 1) specification languages and 2) automated
verification methods for reactive systems. Building on the
recent results in these areas the goal of this project
is to develop verifiable specification languages. This
involves research from two directions: 1) From the
verification side the goal is to extend the scope of
techniques such as model checking to make them applicable
to a wider set of systems. 2) From the specification side
the goal is to restrict the specification languages as much
as possible (without reducing their ability to specify
complex systems) in order to make their automated verification
feasible. This project addresses both theoretical issues
such as complexity and efficiency of model checking
techniques, and practical issues such as investigating
usability of software specification languages and automated
verification techniques in practice.
Funding
This project is funded by the NSF CAREER award CCF-9984822
(PI: Tevfik Bultan,
Period: 2000-2004,
Amount: $200,000)
provided by the Directorate
for Computer and Information Science and Engineering (CISE),
Division of Computing and Communications Foundations (CCF),
Software Engineering and Languages Program (SEL).
Project Title: A Composite Model Checking Toolset for Analyzing Software Systems
Summary
The goal of this project is to investigate a new automated verification
technique for analyzing software systems called composite model checking.
Model checking is a technique for searching the state space of a system
to find out
if it satisfies a given property. The success of model checking procedures
rely on the efficiency of the data structures used to represent the states of
the system. In this project a composite model will be developed for combining
multiple type-specific representations so that all variable types can be
represented efficiently during the state space search. This way,
representations such as Binary Decision Diagrams which are efficient for
encoding Boolean variables will be combined with representations such as linear
arithmetic constraints which can encode unbounded variables. A set of tools
will be implemented to support composite model checking. The layered structure
of the toolset will allow other researchers to use it at various levels 1) for
testing the effectiveness of new representations; 2) for investigating various
model checking strategies such as backward and forward search, partitioning and
abstraction; or 3) for analyzing complex software specifications.
Funding
This project is funded by the NSF grant CCF-9970976
(PI: Tevfik Bultan,
Period: 1999-2003,
Amount: $300,000)
provided by the Directorate
for Computer and Information Science and Engineering (CISE),
Division of Computing and Communications Foundations (CCF),
Software Engineering and Languages Program (SEL).
Project Title: Tools and Techniques for Workflow Specifications
Summary
In this project we investigated the specification and verification techniques
for workflow specifications.
Workflow systems are used for
organizing the execution of multiple tasks, typically in support of a
business or scientific process.
Since
workflow systems involve concurrent execution of multiple tasks, the
control logic could easily get complicated.
We applied automated verification techniques to workflow systems
and developed several heuristics to improve the efficiency of the
verification task.
In particular, we investigated developing
automated verification techniques for a workflow language called
Vortex developed at Bell Labs.
As a case study, we verified properties of a Vortex
specification for an
automated help system for customers browsing a web store-front.
Funding
This project is funded by a grant
(PIs: Tevfik Bultan and Jianwen Su,
Period: 1999-2000,
Amount: $6,000)
from the Committee on Research, University
of California, Santa Barbara.