Skip to main content

Twinning Automata and Regular Expressions for String Static Analysis

  • Conference paper
  • First Online:
Verification, Model Checking, and Abstract Interpretation (VMCAI 2021)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12597))

Abstract

In this paper we formalize \(\textsc {Tarsis}\), a new abstract domain based on the abstract interpretation theory that approximates string values through finite state automata. The main novelty of \(\textsc {Tarsis}\) is that it works over an alphabet of strings instead of single characters. On the one hand, such an approach requires a more complex and refined definition of the widening operator, and the abstract semantics of string operators. On the other hand, it is in position to obtain strictly more precise results than state-of-the-art approaches. We implemented a prototype of \(\textsc {Tarsis}\), and we applied it to some case studies taken from some of the most popular Java libraries manipulating string values. The experimental results confirm that \(\textsc {Tarsis}\) is in position to obtain strictly more precise results than existing analyses.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://commons.apache.org/proper/commons-lang/.

  2. 2.

    The set \(\varSigma ^*_\mathsf {P}\) can be easily computed collecting the constant strings in \(\mathsf {P}\) by visiting its abstract syntax tree and then computing their substrings.

  3. 3.

    For the sake of readability, in the program examples presented in this paper the plus operation between strings corresponds to the string concatenation.

  4. 4.

    The abstract semantics of concat does not add any further important technical detail to the paper hence it is not reported.

  5. 5.

    Note that this is a decidable check since \(\mathtt{A}\) and \(\mathtt{A}'\) are cycle-free, otherwise the interval \([-1, +\infty ]\) would be returned in the first case.

  6. 6.

    \(\mathsf {\textsc {Fa}_{/\equiv }}\) source code: https://github.com/SPY-Lab/fsa.

  7. 7.

    \(\textsc {Tarsis}\) source code: https://github.com/UniVE-SSV/tarsis.

References

  1. Abdulla, P.A., et al.: Efficient handling of string-number conversion. In: Donaldson, A.F., Torlak, E. (eds.) Proceedings of the 41st ACM SIGPLAN International Conference on Programming Language Design and Implementation, PLDI 2020, London, UK, 15–20 June 2020, pp. 943–957. ACM (2020). https://doi.org/10.1145/3385412.3386034

  2. Abdulla, et al.: String constraints for verification. In: Biere, A., Bloem, R. (eds.) Computer Aided Verification - 26th International Conference, CAV 2014, Held as Part of the Vienna Summer of Logic, VSL 2014, Vienna, Austria, 18–22 July 2014. Proceedings. Lecture Notes in Computer Science, vol. 8559, pp. 150–166. Springer (2014). https://doi.org/10.1007/978-3-319-08867-9_10

  3. Abdulla, P.A., Atig, M.F., Diep, B.P., Holík, L., Janku, P.: Chain-free string constraints. In: Chen, Y., Cheng, C., Esparza, J. (eds.) Automated Technology for Verification and Analysis - 17th International Symposium, ATVA 2019, Taipei, Taiwan, 28–31 October 2019, Proceedings. Lecture Notes in Computer Science, vol. 11781, pp. 277–293. Springer (2019). https://doi.org/10.1007/978-3-030-31784-3_16

  4. Almashfi, N., Lu, L.: Precise string domain for analyzing javascript arrays and objects. In: 2020 3rd International Conference on Information and Computer Technologies (ICICT), pp. 17–23 (2020)

    Google Scholar 

  5. Amadini, R., Gange, G., Stuckey, P.J.: Dashed strings for string constraint solving. Artif. Intell. 289, 103368 (2020). https://doi.org/10.1016/j.artint.2020.103368

    Article  MathSciNet  Google Scholar 

  6. Arceri, V., Maffeis, S.: Abstract domains for type juggling. Electron. Notes Theor. Comput. Sci. 331, 41–55 (2017). https://doi.org/10.1016/j.entcs.2017.02.003

    Article  MATH  Google Scholar 

  7. Arceri, V., Mastroeni, I.: A sound abstract interpreter for dynamic code. In: Hung, C., Cerný, T., Shin, D., Bechini, A. (eds.) SAC 2020: The 35th ACM/SIGAPP Symposium on Applied Computing, online event, [Brno, Czech Republic], 30 March–3 April 2020. pp. 1979–1988. ACM (2020). https://doi.org/10.1145/3341105.3373964

  8. Arceri, V., Mastroeni, I., Xu, S.: Static analysis for Ecmascript string manipulation programs. Appl. Sci. 10, 3525 (2020). https://doi.org/10.3390/app10103525

    Article  Google Scholar 

  9. Arceri, V., Olliaro, M., Cortesi, A., Mastroeni, I.: Completeness of abstract domains for string analysis of javascript programs. In: Hierons, R.M., Mosbah, M. (eds.) Theoretical Aspects of Computing - ICTAC 2019–16th International Colloquium, Hammamet, Tunisia, 31 October–4 November 2019, Proceedings. Lecture Notes in Computer Science, vol. 11884, pp. 255–272. Springer (2019). https://doi.org/10.1007/978-3-030-32505-3_15

  10. Bartzis, C., Bultan, T.: Widening arithmetic automata. In: Alur, R., Peled, D.A. (eds.) Computer Aided Verification, 16th International Conference, CAV 2004, Boston, MA, USA, 13–17 July 2004, Proceedings. Lecture Notes in Computer Science, vol. 3114, pp. 321–333. Springer (2004). https://doi.org/10.1007/978-3-540-27813-9_25

  11. Chen, T., Hague, M., Lin, A.W., Rümmer, P., Wu, Z.: Decision procedures for path feasibility of string-manipulating programs with complex operations. Proc. ACM Program. Lang. 3(POPL), 49:1–49:30 (2019). https://doi.org/10.1145/3290362

  12. Choi, T., Lee, O., Kim, H., Doh, K.: A practical string analyzer by the widening approach. In: Kobayashi, N. (ed.) Programming Languages and Systems, 4th Asian Symposium, APLAS 2006, Sydney, Australia, 8–10 November 2006, Proceedings. Lecture Notes in Computer Science, vol. 4279, pp. 374–388. Springer (2006). https://doi.org/10.1007/11924661_23

  13. Christensen, A.S., Møller, A., Schwartzbach, M.I.: Precise analysis of string expressions. In: Cousot, R. (ed.) Static Analysis, 10th International Symposium, SAS 2003, San Diego, CA, USA, 11–13 June 2003, Proceedings. Lecture Notes in Computer Science, vol. 2694, pp. 1–18. Springer (2003). https://doi.org/10.1007/3-540-44898-5_1

  14. Cortesi, A., Olliaro, M.: M-string segmentation: A refined abstract domain for string analysis in C programs. In: Pang, J., Zhang, C., He, J., Weng, J. (eds.) 2018 International Symposium on Theoretical Aspects of Software Engineering, TASE 2018, Guangzhou, China, 29–31 August 2018, pp. 1–8. IEEE Computer Society (2018). https://doi.org/10.1109/TASE.2018.00009

  15. Cortesi, A., Zanioli, M.: Widening and narrowing operators for abstract interpretation. Comput. Lang. Syst. Struct. 37(1), 24–42 (2011). https://doi.org/10.1016/j.cl.2010.09.001

    Article  MATH  Google Scholar 

  16. Costantini, G., Ferrara, P., Cortesi, A.: A suite of abstract domains for static analysis of string values. Softw. Pract. Exp. 45(2), 245–287 (2015). https://doi.org/10.1002/spe.2218

    Article  Google Scholar 

  17. Cousot, P., Cousot, R.: Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints. In: Graham, R.M., Harrison, M.A., Sethi, R. (eds.) Conference Record of the Fourth ACM Symposium on Principles of Programming Languages, Los Angeles, California, USA, January 1977, pp. 238–252. ACM (1977). https://doi.org/10.1145/512950.512973

  18. Cousot, P., Cousot, R.: Systematic design of program analysis frameworks. In: Aho, A.V., Zilles, S.N., Rosen, B.K. (eds.) Conference Record of the Sixth Annual ACM Symposium on Principles of Programming Languages, San Antonio, Texas, USA, January 1979, pp. 269–282. ACM Press (1979). https://doi.org/10.1145/567752.567778

  19. Cousot, P., Cousot, R.: Abstract interpretation frameworks. J. Log. Comput. 2(4), 511–547 (1992). https://doi.org/10.1093/logcom/2.4.511

    Article  MathSciNet  MATH  Google Scholar 

  20. Cousot, P., Halbwachs, N.: Automatic discovery of linear restraints among variables of a program. In: Aho, A.V., Zilles, S.N., Szymanski, T.G. (eds.) Conference Record of the Fifth Annual ACM Symposium on Principles of Programming Languages, Tucson, Arizona, USA, January 1978, pp. 84–96. ACM Press (1978). https://doi.org/10.1145/512760.512770

  21. D’Antoni, L., Veanes, M.: Minimization of symbolic automata. In: Jagannathan, S., Sewell, P. (eds.) The 41st Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2014, San Diego, CA, USA, 20–21 January 2014, pp. 541–554. ACM (2014). https://doi.org/10.1145/2535838.2535849

  22. Davis, M.D., Sigal, R., Weyuker, E.J.: Computability, Complexity, and Languages: Fund. of Theor. CS. Academic Press Professional, Inc. (1994)

    Google Scholar 

  23. D’Silva, V.: Widening for Automata. MsC Thesis, Inst. Fur Inform. - UZH (2006)

    Google Scholar 

  24. Giacobazzi, R., Ranzato, F., Scozzari, F.: Making abstract interpretations complete. J. ACM 47(2), 361–416 (2000). https://doi.org/10.1145/333979.333989

  25. Madsen, M., Andreasen, E.: String analysis for dynamic field access. In: Cohen, A. (ed.) Compiler Construction - 23rd International Conference, CC 2014, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2014, Grenoble, France, 5–13 April, 2014. Proceedings. Lecture Notes in Computer Science, vol. 8409, pp. 197–217. Springer (2014). https://doi.org/10.1007/978-3-642-54807-9_12

  26. Mastroeni, I., Nikolic, D.: Abstract program slicing: From theory towards an implementation. In: Dong, J.S., Zhu, H. (eds.) Formal Methods and Software Engineering - 12th International Conference on Formal Engineering Methods, ICFEM 2010, Shanghai, China, 17–19 November 2010. Proceedings. Lecture Notes in Computer Science, vol. 6447, pp. 452–467. Springer (2010). https://doi.org/10.1007/978-3-642-16901-4_30

  27. Mastroeni, I., Zanardini, D.: Abstract program slicing: an abstract interpretation-based approach to program slicing. ACM Trans. Comput. Log. 18(1), 7:1–7:58 (2017). https://doi.org/10.1145/3029052

  28. Midtgaard, J., Nielson, F., Nielson, H.R.: A parametric abstract domain for lattice-valued regular expressions. In: Rival, X. (ed.) Static Analysis - 23rd International Symposium, SAS 2016, Edinburgh, UK, 8–10 September 2016, Proceedings. Lecture Notes in Computer Science, vol. 9837, pp. 338–360. Springer (2016). https://doi.org/10.1007/978-3-662-53413-7_17

  29. Park, C., Im, H., Ryu, S.: Precise and scalable static analysis of jquery using a regular expression domain. In: Ierusalimschy, R. (ed.) Proceedings of the 12th Symposium on Dynamic Languages, DLS 2016, Amsterdam, The Netherlands, 1 November 2016, pp. 25–36. ACM (2016). https://doi.org/10.1145/2989225.2989228

  30. Preda, M.D., Giacobazzi, R., Lakhotia, A., Mastroeni, I.: Abstract symbolic automata: mixed syntactic/semantic similarity analysis of executables. In: Rajamani, S.K., Walker, D. (eds.) Proceedings of the 42nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2015, Mumbai, India, 15–17 January 2015, pp. 329–341. ACM (2015). https://doi.org/10.1145/2676726.2676986

  31. Rival, X., Mauborgne, L.: The trace partitioning abstract domain. ACM Trans. Program. Lang. Syst. 29(5), 26-es, August 2007. https://doi.org/10.1145/1275497.1275501

  32. Veanes, M.: Applications of symbolic finite automata. In: Konstantinidis, S. (ed.) Implementation and Application of Automata - 18th International Conference, CIAA 2013, Halifax, NS, Canada, July 16–19, 2013. Proceedings. Lecture Notes in Computer Science, vol. 7982, pp. 16–23. Springer (2013). https://doi.org/10.1007/978-3-642-39274-0_3

  33. Wang, H., Chen, S., Yu, F., Jiang, J.R.: A symbolic model checking approach to the analysis of string and length constraints. In: Huchard, M., Kästner, C., Fraser, G. (eds.) Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, ASE 2018, Montpellier, France, 3–7 September 2018, pp. 623–633. ACM (2018). https://doi.org/10.1145/3238147.3238189

  34. Yu, F., Alkhalaf, M., Bultan, T., Ibarra, O.H.: Automata-based symbolic string analysis for vulnerability detection. Formal Meth. Syst. Des. 44(1), 44–70 (2013). https://doi.org/10.1007/s10703-013-0189-1

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vincenzo Arceri .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Negrini, L., Arceri, V., Ferrara, P., Cortesi, A. (2021). Twinning Automata and Regular Expressions for String Static Analysis. In: Henglein, F., Shoham, S., Vizel, Y. (eds) Verification, Model Checking, and Abstract Interpretation. VMCAI 2021. Lecture Notes in Computer Science(), vol 12597. Springer, Cham. https://doi.org/10.1007/978-3-030-67067-2_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-67067-2_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-67066-5

  • Online ISBN: 978-3-030-67067-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics