University of California, Santa Barbara

Department of Computer Science

VLab Projects

Current Projects


Project Title: Integrated Symbolic execution for Space-Time Analysis of Code (ISSTAC)

Summary
Cybersecurity hinges upon finding vulnerabilities in software systems before they are deployed in an environment open to malicious actors. As the implementation flaws in software systems are eliminated by increasingly sophisticated software analysis techniques, attacks relying on the inherent space-time complexity of algorithms used for building software systems are gaining prominence. When an adversary can inexpensively generate inputs that induce behaviors with expensive space-time resource utilization at the defender's end, in addition to mounting denial-of-service attacks, the adversary can also use the same inputs to facilitate side-channel attacks in order to infer some secret from the observed system behavior.

Our objective in this project is to develop automated analysis techniques and implement them in an industrial-strength tool that allows the efficient analysis of software (in the form of Java bytecode) with respect to these problems rapidly enough for inclusion in a state-of-the-art development process. The analysis will result in (1) estimates for the worst-case space-time complexity and algorithms, (2) input constraints that implicitly characterize inputs to the algorithms that trigger the worst-case behavior, (3) concrete test examples for ``bad'' inputs, and (4) estimates for the information leakage from the system through observables related to time and memory resource utilization.

Our goal is to perform the analysis using a radically improved form of symbolic execution of Java bytecode that will yield the results described above. Symbolic execution is a very powerful code analysis technique that not only implicitly enumerates all program execution paths, but is also capable of constructing the inputs that trigger those paths. To address the scalability, speed, and accuracy issues we will (1) apply new promising techniques to worst-case and side-channel analysis, (2) develop a novel and efficient model-counting constraint solver that will be used in constructing the malicious inputs and computing the leakage, and (3) implement a cloud-based platform for distributed symbolic execution where multiple program execution paths are explored in parallel.

Today such worst-case and side-channel analysis is done on the level of individual algorithms either by hand or with minimal automation, but on the level of systems it is mainly done by code reviews and/or testing. Clearly, this is not scalable due to the complexity of the systems involved. If our project succeeds, developers and customers (like government agencies) will be able to use the tools we develop to find out the worst-case behavior of their code, including input examples that trigger that behavior. Estimates for side-channel capacity will also be available for making informed acceptance decisions. Finally, the analysis will be done rapidly enough for inclusion in a nightly build process.

Funding
This project is funded by DARPA under agreement number FA8750-15-2-0087 (PI: Tevfik Bultan, Period: 2015-2019, Amount: $6,000,000, $1,400,000 to UCSB; joint with Vanderbilt University, PI: Gabor Karsai; CMU, PI: Corina Pasareanu), under the Space/Time Analysis for Cybersecurity Program.


Project Title: Leveraging Graph Databases for Incremental and Scalable Symbolic Analysis and Verification of Web Applications

Summary
Modern human society relies heavily on web applications and is deeply affected by their poor dependability. Unfortunately, no matter how much effort we put into verification and validation of software, existing techniques are inherently limited, and software is routinely released with bugs and issues that limit its functionality and can dramatically affect the user experience. This situation is only exacerbated by the characteristics of today’s systems, which are increasingly dynamic, heterogeneous, distributed, configurable, and ultimately complex. This is especially the case for web applications, which are increasingly popular and often handle confidential and sensitive data.

Eliminating the dependability problems that plague today’s web applications requires revolutionary changes to current software verification and validation practices. In this project we propose a high risk-high payoff approach that has the potential to significantly improve the scalability and effectiveness of symbolic program analysis and verification techniques for web applications. By integrating program analysis and verification techniques with advanced graph database technology we plan to achieve efficient storage and retrieval of intermediate and prior analysis results, enabling incremental and differential analysis and verification strategies. Our research will bring the power of big data processing to program analysis and verification, and we believe that the unique combination of ideas we propose that build on techniques from diverse research areas can have a big impact on improving software dependability.

> In particular, in this project we plan to develop techniques and tools that will (1) use symbolic execution and automata-based verification techniques to automatically analyze and verify web applications, (2) store results of symbolic analyses in a graph database for efficient storage, and retrieval, (3) use incremental and differential analysis strategies that utilize the graph database in order to improve scalability and effectiveness of software analysis and verification

Funding
This project is funded by the NSF grant CCF-1548848 (PI: Tevfik Bultan, Co-PI Xifeng Yan Period: 2015-2017, Amount: $200,000, $100,000 to UCSB; joint with CMU, PI: Corina Pasareanuru; Georgia Tech, PI: Alex Orso) provided by the Directorate for Computer and Information Science and Engineering (CISE), Division of Computer and Communication Foundations (CCF), Software and Hardware Foundations Program.


Project Title: Data Model Verification for Web Applications

Summary
Nowadays most of our work, communication, and even entertainment, involves using software applications. Increasingly, software applications are web-based and not bound to the computing devices we use, be it a desktop, laptop, tablet, or smartphone. These modern software applications store the data in remote data centers using cloud computing platforms. In this modern computing landscape, software becomes a service that is accessible from any device, anywhere, anytime. This saves the users from hassles of local data storage, software installation, configuration management, upgrades and security patches, all the things one has to deal with when using traditional desktop software

However, these benefits come with a cost: the increasing complexity of software applications. Modern software applications are distributed systems that consist of multiple components that run in parallel on multiple machines in the cloud, and interact with each other in complex ways via the Internet. As one would expect, developing such software systems is an extremely difficult and error-prone task. Most of the time the developers are not able to find all the errors in a software application before it is released to the users. So, bugs in modern software systems are common, leading to unintended behaviors, security vulnerabilities, and crashes. The fact that existing software development practices are not able to produce dependable web applications was recently demonstrated by the well publicized problems of the HealthCare.gov website.

Web applications use the three-tier architecture that consists of a client, a server and a back-end datastore. The client-side code is responsible for coordinating the interaction with the user, the server-side code implements the business logic and determines the control flow of the application, and the back-end datastore keeps the persistent data. The fundamental building block of a web application is the data model that specifies the types of objects (e.g., user, account) and the relations among the objects (e.g., the association between the users and accounts) stored by the application, the constraints on the relations (e.g., each account must be associated with a user), and the actions that update the data by sending queries to the back-end datastore based on the user input. The data model of a web application is typically specified using an object-relational mapping (ORM) that maps the object-oriented code at the server side to the relational database at the back-end.

Funding
This project is funded by the NSF grant CCF-1423623 (PI: Tevfik Bultan, Period: 2014-2017, Amount: $500,000) provided by the Directorate for Computer and Information Science and Engineering (CISE), Division of Computer and Communication Foundations (CCF), Software and Hardware Foundations Program.



Past Projects

Project Title: Viewpoints: Discovering Client- and Server-side Input Validation Inconsistencies to Improve Web Application Security

Summary
Nowadays web applications are part of every aspect of the society, from entertainment to business to social interaction. Hence, security of web applications is an extremely important and urgent problem. Since web applications are easily accessible, and often store a large amount of sensitive user information, they are a typical target for attackers. In particular, attacks that target input validation vulnerabilities are extremely common and effective. These attacks exploit erroneous or insufficient validation and sanitization of the user inputs to inject malicious data that can result in execution of harmful commands and access to sensitive information. Some of these attacks exploit well-known vulnerabilities, such as SQL injection and cross-site scripting, whereas some others exploit application specific vulnerabilities that are hard to identify because they depend on the specific input validation logic of the target application.

The goal of this proposal is to identify and mitigate vulnerabilities in web applications by performing automatic checking of input validation and sanitization. The key insight for this work comes from the observation that developers often introduce redundant checks in both the front-end (client) and the back-end (server) component of a web application. Client-side checks are fast and can improve performance and responsiveness of the application, but can be easily circumvented; Server-side check are hard to circumvent, but require network round-trips and additional server-side processing. Our intuition is that the checks performed at the client and server sides should be identical, that is, they should enforce the same set of constraints on the inputs. If client-side checks are more restrictive, the server (application) may accept inputs that legitimate clients can never produce, which is problematic, as malicious users can easily bypass client-side checks. On the other hand, if server-side checks are more restrictive, the client may produce requests that are subsequently rejected by the server, which is not ideal from a performance point of view. Based on this intuition, we propose to compare the set of checks performed on one side to the set of checks performed on the other side, identify and report inconsistencies between the two sets of checks, and try to automatically extend the checks to eliminate the inconsistencies.

Funding
This project is funded by the NSF collaborative grant CNS 1116967 (PI: Tevfik Bultan, Co-PI: Chris Kruegel, Period: 2011-2013, Amount: $500,000, $300,000 to UCSB; joint with Georgia Tech, PI: Alex Orso) provided by the Directorate for Computer and Information Science and Engineering (CISE), Division of Computer and Network Systems (CNS), Trustworthy Computing Program.



Project Title: Formal Analysis of Distributed Interactions

Summary
Software systems are becoming increasingly more concurrent and distributed. In fact, nowadays, many software systems consist of multiple components that execute concurrently, possibly on different machines. Moreover, new trends in computing, such as service-oriented architecture, cloud computing, multi-core hardware, all point to even more concurrency and distribution among the components of software systems in the future. At the same time, concurrent and distributed software systems are increasingly used in every aspect of society and in some cases provide safety critical services. Hence, it extremely important and urgent that computer scientists develop techniques that guarantee that these software systems behave as they are expected.

A crucial problem in dependability of concurrent and distributed software systems is the coordination of different components that form the whole system. In order to complete a task, components of a software system have to coordinate their executions by interacting with each other. Message-based communication is an increasingly common interaction mechanism used in concurrent and distributed systems where components interact with each other by sending and receiving messages. The goal of this project is to develop novel techniques for specification, analysis, and verification of message-based interactions.

Funding
This project is funded by the NSF collaborative grant CCF 1117708 (PI: Tevfik Bultan, Co-PI: Oscar Ibarra, Period: 2011-2014, Amount: $494,000, $329,000 to UCSB; joiny with Iowa State, PI: Samik Basu) provided by the Directorate for Computer and Information Science and Engineering (CISE), Division of Computer and Communication Foundations (CCF), Software and Hardware Foundations Program.



Project Title: Automata Based String Analysis for Detecting Vulnerabilities in Web Applications

Summary
Web applications contain numerous vulnerabilities that can be exploited by attackers to gain unauthorized access to confidential information and to manipulate sensitive data. Certain input validation vulnerabilities can be automatically identified by means of static code analysis. In particular, data flow analysis can be used to track the flow of potentially malicious data from a source (where user-input is read) to a sink (a sensitive program operation). Traditional data flow analysis considers only binary taint qualifiers, where standard sanitization functions are assumed to convert a tainted (bad) value to an untainted (safe) value. However, such an approach misses certain SQL injection vulnerabilities and cannot identify incorrect sanitization routines. To cover a broader range of security flaws, researchers have proposed to extend data flow analysis with string analysis, a technique that captures the string values that a certain variable might hold at a particular program point. Unfortunately, existing string analysis techniques often make conservative approximations that lead to unnecessarily imprecise results. This makes static vulnerability detection tools report many false positives. In this project, we propose to develop novel techniques for more precise string analysis. To this end, we will use an automata-based approach that represents possible values of a string variable at a program point as a deterministic finite automaton (DFA). Using DFAs, we will develop techniques that support pathsensitivity. To enable precise analysis of loops, we will develop automata-based widening operations. Furthermore, we will extend the basic string analysis techniques to a composite analysis where relationships among string variables and other types of variables can be automatically discovered and analyzed. Using our novel string analysis techniques, false positives due to overly conservative approximations will be significantly reduced. In addition to reducing false positives, we also propose to leverage the results of our precise string analysis to develop novel security solutions and to detect interesting, new classes of vulnerabilities. In particular, we plan to use string analysis to detect malicious file inclusion vulnerabilities. To identify this important type of security problem, it is necessary to possess precise information on the arguments of operations that dynamically load code. Moreover, we propose two novel program capability analyses that use precise string information to gather information about how an application interacts with a back-end database and the file system. This allows us to restrict an application’s database access privileges to a required minimum, and it supports the prevention of directory traversal attacks.

Funding
This project is funded by the NSF grant CCF 0916112 (PI: Tevfik Bultan, Co-PI: Chris Kruegel, Period: 2009-2012, Amount: $350,000) provided by the Directorate for Computer and Information Science and Engineering (CISE), Division of Computing and Communication Foundations (CCF), Trusted Computing Program.



Project Title: Automated Verification of Web Application Data Models

Summary
Although the web began as a medium for sharing information stored in static HTML pages, it has evolved into a ubiquitous medium for interaction and dynamism, driven by sophisticated web applications that can provide highly complex functionality that was once only available in desktop applications. There is a large stumbling block to this ever increasing reliance on web applications: Web applications are not dependable. The goal of this project is to improve the dependability of web applications by developing automated techniques for extraction of data models, and for inference, verification and enforcement of their properties.

Funding
This project is funded by a grant (PI: Tevfik Bultan, Period: 2012-2013, Amount: $5,683.00) from the Committee on Research, University of California, Santa Barbara.



Project Title: Modeling and Analyzing Trust in Service-Oriented Architectures

Summary
Service-oriented architectures (SOAs) enable the dynamic integration of services implemented on heterogeneous computer systems. An advantage of SOAs is that they provide high-level, semantically-rich services by abstracting the low-level details of their implementation. The availability of useful services such as Google's and Amazon's web service APIs has fostered a number of applications that integrate a number of services to provide novel, more complex functionality. Unfortunately, as is the case with most software, the security of SOAs has been a secondary concern. Most of the security efforts in the SOA community have been focusing on developing web services standards, such as WS-Security, XACML, and SAML. The problem with these standards is that they only specify an acceptable level of practice, not the best practice, and this acceptable level is only achieved if the standard is properly applied. However, experience shows that both web services and service-based applications are often developed under strict time constraints by programmers with little security training. As a result, vulnerable services and service-based applications are deployed and made available to the whole Internet, creating easily-exploitable entry points for the compromise of entire networks. As the service-oriented model is increasingly adopted, there is a need for rigorous approaches to model trust in SOAs and ensure trust in the composition of services, in the form of service-based applications. These approaches should guarantee that by composing services that satisfy certain trust properties one will obtain an application that satisfies the desired trust properties. The goal of this project is to develop novel tools and techniques for the modeling and analysis of trust properties of SOA-based applications. More precisely, the proposed research is to develop a framework that leverages formal models of trust, automated, and interactive verification techniques, and program analysis techniques to address trust properties of single services and of their compositions, at both the interface and implementation levels.

Funding
This project is funded by the NSF grant CNS-0716095 (PI: Giovanni Vigna, Co-PIs: Tevfik Bultan, Richard Kemmerer, Period: 2007-2010, Amount: $850,000) provided by the Directorate for Computer and Information Science and Engineering (CISE), Division of Compututer and Network Systems (CNS), Cyber Trust Program.



Project Title: Automated Verification of the Native Client

Summary
Native Client (NaCl) is a sandbox for untrusted x86 native code. The NaCl framework enables execution of x86 native code in web applications in order to achieve computational performance of native applications without compromising safety. The NaCl validator checks a set of constraints on NACl binaries before executing them to guarantee their safety. The goal of this project is to use automated verification techniques to analyze the NaCl implementation in order to eliminate bugs, and to improve the confidence in the safety guarantees provided by the NaCl framework.

Funding
This project is funded by a Google Research Award (PIs: Tevfik Bultan and Christopher Kruegel, Period: 2010-2011, Amount: $50,000).



Project Title: Design for Verification

Summary
Developing dependable software is one of the most important challenges in computer science and a scientific approach to design has to address this problem. There has been significant progress in automated verification techniques in recent years, however, scalable software verification remains out of reach. A design for verification approach, which enables software developers to document the design decisions that can be useful during verification, can improve scalability and applicability of automated verification techniques significantly. The approach proposed in this project is to use behavioral interface specifications that support abstraction and modular verification to achieve scalable verification. These specifications will be embedded into the code using design patterns which are geared towards specific classes of verification problems and domain specific verification techniques. Each design pattern will identify the design information necessary to achieve scalable verification in a particular application domain and provide helper classes that will be used to record the required design information in the code. This design information will be used to identify the module boundaries and construct compact models which abstract parts of the code that do not relate to the properties that are being verified. Initial results on application of this approach to verification of synchronization operations in concurrent programs and to verification of interactions among distributed components led to very promising results. These preliminary results suggest that it should be possible to develop a general design for verification framework based on these principles.

Funding
This project is funded by the NSF grant CCF-0614002 (PI: Tevfik Bultan, Period: 2006-2008, Amount: $200,000) provided by the Directorate for Computer and Information Science and Engineering (CISE), Division of Computing and Communications Foundations (CCF), Science of Design Program.



Project Title: Design for Verification: A New Approach for Developing Highly Dependable Software

Summary
This project will investigate two aspects of design for verification. The first one is enabling software designers to structure the software systems in ways that make them easier to verify. This requires identifying the design principles for developing verifiable systems. The second aspect of design for verification is to elicit extra information from the software designers that would be helpful during automated verification. This requires identifying mechanisms for passing this information obtained during the design phase to the verification phase.

Funding
This project is funded by a grant (PI: Tevfik Bultan, Period: 2006-2007, Amount: $7,000) from the Committee on Research, University of California, Santa Barbara.



Project Title: Reliable Concurrent Software Development via Reliable Concurrency Controllers

Summary
Run-time errors in concurrent programs are generally due to wrong usage of synchronization primitives such as monitors. Conventional validation techniques such as testing become ineffective for concurrent programs since the state space increases exponentially with the number of concurrent processes. In recent years there has been considerable progress in automated verification techniques for concurrent systems based on model checking. This project presents a framework for reliable concurrent programming based on interface-based specification and verification of concurrency controllers. Proposed specification and verification techniques for concurrency controllers are modularized by decoupling behavior and interface specifications. The behavior specification is a set of actions (which correspond to methods or procedures) composed of guarded commands. The interface specification is a finite state machine whose transitions represent actions. Concurrency controllers can be designed modularly by composing their interfaces. The proposed approach separates the verification of the concurrency controllers (behavior verification) from the verification of the threads which use them (interface verification). For behavior verification it is possible to use symbolic and infinite-state verification techniques which enables verification of controllers with parameterized constants, unbounded variables and arbitrary number of client threads. For interface verification it is possible to use explicit state program verification techniques which enables verification of arbitrary thread implementations without any restrictions. The correctness of the user threads can be verified using stubs generated from the concurrency controller interfaces which improves the efficiency of the thread verification significantly. It is possible to synthesize efficient Java monitors from the concurrency controller specifications, and the generated implementations preserve the verified properties of the specifications. The proposed framework will be implemented as a set of tools for concurrency controller specification, verification and synthesis.

Funding
This project is funded by the NSF grant CCF-0341365 (PI: Tevfik Bultan, Period: 2003-2007, Amount: $336,000) provided by the Directorate for Computer and Information Science and Engineering (CISE), Division of Computing and Communications Foundations (CCF), Trusted Computing Program.



Project Title: Developing Reliable Concurrency-Controllers

Summary
In this project, we use automated verification techniques for developing reliable concurrency-controllers. Concurrency-controllers are software components which are used solely for concurrency control. We propose a modular approach to development of concurrency-controllers which: 1) decouples the behavior and the interface specifications of concurrency-controllers and 2) uses both symbolic and explicit state automated verification techniques by exploiting this separation.

Funding
This project is funded by a grant (PI: Tevfik Bultan, Period: 2003-2004, Amount: $8,913) from the Committee on Research, University of California, Santa Barbara.



Project Title: CAREER: Verifiable Specifications: Tools for Reliable Reactive Software Development

Summary
Developing reliable software for reactive systems is a challenging task. This project focuses on combining and extending the results from two areas which address this problem: 1) specification languages and 2) automated verification methods for reactive systems. Building on the recent results in these areas the goal of this project is to develop verifiable specification languages. This involves research from two directions: 1) From the verification side the goal is to extend the scope of techniques such as model checking to make them applicable to a wider set of systems. 2) From the specification side the goal is to restrict the specification languages as much as possible (without reducing their ability to specify complex systems) in order to make their automated verification feasible. This project addresses both theoretical issues such as complexity and efficiency of model checking techniques, and practical issues such as investigating usability of software specification languages and automated verification techniques in practice.

Funding
This project is funded by the NSF CAREER award CCF-9984822 (PI: Tevfik Bultan, Period: 2000-2004, Amount: $200,000) provided by the Directorate for Computer and Information Science and Engineering (CISE), Division of Computing and Communications Foundations (CCF), Software Engineering and Languages Program (SEL).



Project Title: A Composite Model Checking Toolset for Analyzing Software Systems

Summary
The goal of this project is to investigate a new automated verification technique for analyzing software systems called composite model checking. Model checking is a technique for searching the state space of a system to find out if it satisfies a given property. The success of model checking procedures rely on the efficiency of the data structures used to represent the states of the system. In this project a composite model will be developed for combining multiple type-specific representations so that all variable types can be represented efficiently during the state space search. This way, representations such as Binary Decision Diagrams which are efficient for encoding Boolean variables will be combined with representations such as linear arithmetic constraints which can encode unbounded variables. A set of tools will be implemented to support composite model checking. The layered structure of the toolset will allow other researchers to use it at various levels 1) for testing the effectiveness of new representations; 2) for investigating various model checking strategies such as backward and forward search, partitioning and abstraction; or 3) for analyzing complex software specifications.

Funding
This project is funded by the NSF grant CCF-9970976 (PI: Tevfik Bultan, Period: 1999-2003, Amount: $300,000) provided by the Directorate for Computer and Information Science and Engineering (CISE), Division of Computing and Communications Foundations (CCF), Software Engineering and Languages Program (SEL).



Project Title: Tools and Techniques for Workflow Specifications

Summary
In this project we investigated the specification and verification techniques for workflow specifications. Workflow systems are used for organizing the execution of multiple tasks, typically in support of a business or scientific process. Since workflow systems involve concurrent execution of multiple tasks, the control logic could easily get complicated. We applied automated verification techniques to workflow systems and developed several heuristics to improve the efficiency of the verification task. In particular, we investigated developing automated verification techniques for a workflow language called Vortex developed at Bell Labs. As a case study, we verified properties of a Vortex specification for an automated help system for customers browsing a web store-front.

Funding
This project is funded by a grant (PIs: Tevfik Bultan and Jianwen Su, Period: 1999-2000, Amount: $6,000) from the Committee on Research, University of California, Santa Barbara.