The design and implementation of decision procedures for checking
path feasibility in string-manipulating programs is an important
problem, with such applications as symbolic execution of programs
with strings and automated detection of cross-site scripting (XSS)
vulnerabilities in web applications. A (symbolic) path is given as
a finite sequence of assignments and assertions (i.e. without
loops), and checking its feasibility amounts to determining the
existence of inputs that yield a successful execution. Modern
programming languages (e.g. JavaScript, PHP, and Python) support
many complex string operations, and strings are also often
implicitly modified during a computation in some intricate fashion
(e.g. by some autoescaping mechanisms).
In this paper we provide two general semantic conditions which
together ensure the decidability of path feasibility: (1) each
assertion admits regular monadic decomposition (i.e.\ is an
effectively recognisable relation), and (2) each assignment uses a
(possibly nondeterministic) function whose inverse relation
preserves regularity. We show that the semantic conditions are
\emph{expressive} since they are satisfied by a multitude of
string operations including concatenation, one-way and two-way
finite-state transducers, $\replaceall$ functions (where the
replacement string could contain variables), string-reverse
functions, regular-expression matching, and some (restricted)
forms of letter-counting/length functions. The semantic
conditions also strictly subsume existing decidable string
theories (e.g. straight-line fragments, and acyclic logics), and
most existing benchmarks (e.g. most of Kaluza's, and all of
SLOG's, Stranger's, and SLOTH's benchmarks). Our semantic
conditions also yield a conceptually \emph{simple} decision
procedure, as well as an \emph{extensible} architecture of a
string solver in that a user may easily incorporate his/her own
string functions into the solver by simply providing code for the
pre-image computation without worrying about other parts of the
solver. Despite these, the semantic conditions are unfortunately
too general to provide a fast and complete decision procedure. We
provide strong theoretical evidence for this in the form of
complexity results. To rectify this problem, we propose two
solutions. Our main solution is to allow only partial string
functions (i.e., prohibit nondeterminism) in condition (2). This
restriction is satisfied in many cases in practice, and yields
decision procedures that are effective in both theory and
practice. Whenever nondeterministic functions are still needed
(e.g. the string function split), our second solution is to
provide a syntactic fragment that provides a support of
nondeterministic functions, and operations like one-way
transducers, $\replaceall$ (with constant replacement string), the
string-reverse function, concatenation, and regular-expression
matching. We show that this fragment can be reduced to an existing
solver SLOTH that exploits fast model checking algorithms like
IC3.
We provide an efficient implementation of our decision procedure
(assuming our first solution above, i.e., deterministic partial
string functions) in a new string solver OSTRICH. Our
implementation provides built-in support for concatenation,
reverse, functional transducers, and $\replaceall$ and provides a
framework for extensibility to support further string functions.
We demonstrate the efficacy of our new solver against other
competitive solvers.