Abstract
Automatic abstraction is a powerful software verification technique. In this paper, we elaborate an abstract domain for C strings, that is, null-terminated arrays of characters. We describe the abstract semantics of basic string operations and prove their soundness with regards to previously established concrete semantics of those operations. In addition to a selection of string functions from the standard C library, we provide semantics for character access and update, enabling automatic lifting of arbitrary string-manipulating code into the domain.
The domain we present (called M-String) has two other abstract domains as its parameters: an index (bound) domain and a character domain. Picking different constituent domains allows M-String to be tailored for specific verification tasks, balancing precision against complexity.
In addition to describing the domain theoretically, we also provide an executable implementation of the abstract operations. Using a tool which automatically lifts existing programs into the M-String domain along with an explicit-state model checker, we have evaluated the proposed domain experimentally on a few simple but realistic test programs.
This work has been partially supported by the Czech Science Foundation grant No. 18-02177S.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The string of interest of a character array is the sequence of characters up to the first null one (included). In the case in which the null character occurs at the first index of a character array, then its string of interest is defined as “null”. If the null character does not occur in the array, then its string of interest is defined as “undefined”. Otherwise, the string of interest is considered to be “well-defined”.
- 2.
For scalars in C programs, we use the bitvector theory.
- 3.
- 4.
The processor used to run the benchmarks was Intel Xeon E5-2630 clocked at 2.60 GHz. To make reproduction of the benchmarks easier, we provide instructions and scripts in the online supplementary material.
- 5.
The implementations were taken from pdclib, a public-domain libc implementation.
References
Polyspace, MathWorks (2001)
Static Code Analysis, OWASP (2017)
Interactive: the top programming languages 2018. IEEE Spectrum Magazine (2018)
Amadini, R., et al.: Reference abstract domains and applications to string analysis. Fundam. Inform. 158(4), 297–326 (2018)
Amadini, R., et al.: Combining string abstract domains for JavaScript analysis: an evaluation. In: Legay, A., Margaria, T. (eds.) TACAS 2017. LNCS, vol. 10205, pp. 41–57. Springer, Heidelberg (2017). https://doi.org/10.1007/978-3-662-54577-5_3
Baranová, Z., et al.: Model checking of C and C++ with DIVINE 4. In: D’Souza, D., Narayan Kumar, K. (eds.) ATVA 2017. LNCS, vol. 10482, pp. 201–207. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68167-2_14
Bultan, T., Yu, F., Alkhalaf, M., Aydin, A.: String Analysis for Software Verification and Security. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68670-7
Christensen, A.S., Møller, A., Schwartzbach, M.I.: Precise analysis of string expressions. In: Cousot, R. (ed.) SAS 2003. LNCS, vol. 2694, pp. 1–18. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-44898-5_1
Clarke, E.M., Grumberg, O., Long, D.E.: Model checking and abstraction. ACM Trans. Program. Lang. Syst. 16(5), 1512–1542 (1994)
Cortesi, A., Olliaro, M.: M-string segmentation: a refined abstract domain for string analysis in C programs. In: Proceedings of the 12th International Symposium on Theoretical Aspects of Software Engineering, TASE 2018, Guangzhou, China, 29–31 August 2018 (2018)
Costantini, G., Ferrara, P., Cortesi, A.: A suite of abstract domains for static analysis of string values. Softw. Pract. Exp. 45(2), 245–287 (2015)
Cousot, P., Cousot, R.: Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints. In: Conference Record of the Fourth ACM Symposium on Principles of Programming Languages, Los Angeles, California, USA, January 1977, pp. 238–252 (1977)
Cousot, P., Cousot, R., Logozzo, F.: A parametric segmentation functor for fully automatic and scalable array content analysis. In: Proceedings of the 38th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2011, Austin, TX, USA, 26–28 January 2011, pp. 105–118 (2011)
Dor, N., Rodeh, M., Sagiv, S.: CSSV: towards a realistic tool for statically detecting all buffer overflows in C. In: Proceedings of the ACM SIGPLAN 2003 Conference on Programming Language Design and Implementation 2003, San Diego, California, USA, 9–11 June 2003, pp. 155–167 (2003)
D’Silva, V., Kroening, D., Weissenbacher, G.: A survey of automated techniques for formal software verification. IEEE Trans. CAD Integr. Circ. Syst. 27(7), 1165–1178 (2008)
Evans, D., Larochelle, D.: Improving security using extensible lightweight static analysis. IEEE Softw. 19(1), 42–51 (2002)
Holzmann, G.J.: Static source code checking for user-defined properties. In: Integrated Design and Process Technology, IDPT 2002. Society for Design and Process Science, Pasadena (2002)
Jensen, S.H., Møller, A., Thiemann, P.: Type analysis for JavaScript. In: Palsberg, J., Su, Z. (eds.) SAS 2009. LNCS, vol. 5673, pp. 238–255. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-03237-0_17
Jones, R.W.M., Kelly, P.H.J.: Backwards-compatible bounds checking for arrays and pointers in C programs. In: AADEBUG, pp. 13–26 (1997)
Kashyap, V., et al.: JSAI: a static analysis platform for JavaScript. In: Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, (FSE-22), Hong Kong, China, 16–22 November 2014, pp. 121–132 (2014)
Kim, S.-W., Chin, W., Park, J., Kim, J., Ryu, S.: Inferring grammatical summaries of string values. In: Garrigue, J. (ed.) APLAS 2014. LNCS, vol. 8858, pp. 372–391. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-12736-1_20
Lattner, C., Adve, V.: LLVM: a compilation framework for lifelong program analysis & transformation. In: International Symposium on Code Generation and Optimization (CGO 2004), Palo Alto, California, March 2004
Lauko, H., Ročkai, P., Barnat, J.: Symbolic computation via program transformation. In: Fischer, B., Uustalu, T. (eds.) ICTAC 2018. LNCS, vol. 11187, pp. 313–332. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-02508-3_17
Madsen, M., Andreasen, E.: String analysis for dynamic field access. In: Cohen, A. (ed.) CC 2014. LNCS, vol. 8409, pp. 197–217. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-642-54807-9_12
One, A.: Smashing the stack for fun and profit. Phrack 7(49) (1996 ). http://www.phrack.com/issues.html?issue=49&id=14
Park, C., Im, H., Ryu, S.: Precise and scalable static analysis of jQuery using a regular expression domain. In: Proceedings of the 12th Symposium on Dynamic Languages, DLS 2016, Amsterdam, The Netherlands, 1 November 2016, pp. 25–36 (2016)
Shahriar, H., Zulkernine, M.: Classification of static analysis-based buffer overflow detectors. In: Fourth International Conference on Secure Software Integration and Reliability Improvement, SSIRI 2010, Singapore, 9–11 June 2010, Companion Volume, pp. 94–101 (2010)
Spoto, F.: The julia static analyzer for Java. In: Rival, X. (ed.) SAS 2016. LNCS, vol. 9837, pp. 39–57. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-53413-7_3
Wagner, D.A., Foster, J.S., Brewer, E.A., Aiken, A.: A first step towards automated detection of buffer overrun vulnerabilities. In: Proceedings of the Network and Distributed System Security Symposium, NDSS, San Diego, California, USA, p. 2000 (2000)
Xie, Y., Chou, A., Engler, D.R.: ARCHER: using symbolic, path-sensitive analysis to detect memory access errors. In: Proceedings of the 11th ACM SIGSOFT Symposium on Foundations of Software Engineering 2003 Held Jointly with 9th European Software Engineering Conference, ESEC/FSE 2003, Helsinki, Finland, 1–5 September 2003, pp. 327–336 (2003)
Xu, R-G., Godefroid, P., Majumdar, R.: Testing for buffer overflows with length abstraction. In: Proceedings of the ACM/SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2008, Seattle, WA, USA, 20–24 July 2008, pp. 27–38 (2008)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Cortesi, A., Lauko, H., Olliaro, M., Ročkai, P. (2019). String Abstraction for Model Checking of C Programs. In: Biondi, F., Given-Wilson, T., Legay, A. (eds) Model Checking Software. SPIN 2019. Lecture Notes in Computer Science(), vol 11636. Springer, Cham. https://doi.org/10.1007/978-3-030-30923-7_5
Download citation
DOI: https://doi.org/10.1007/978-3-030-30923-7_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30922-0
Online ISBN: 978-3-030-30923-7
eBook Packages: Computer ScienceComputer Science (R0)