A family of experiments to assess the effectiveness and efficiency of source code obfuscation techniques

Ceccato, Mariano; Di Penta, Massimiliano; Falcarin, Paolo; Ricca, Filippo; Torchiano, Marco; Tonella, Paolo

doi:10.1007/s10664-013-9248-x

A family of experiments to assess the effectiveness and efficiency of source code obfuscation techniques

Published: 23 February 2013

Volume 19, pages 1040–1074, (2014)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

Mariano Ceccato¹,
Massimiliano Di Penta²,
Paolo Falcarin³,
Filippo Ricca⁴,
Marco Torchiano⁵ &
…
Paolo Tonella¹

1633 Accesses
48 Citations
1 Altmetric
Explore all metrics

Abstract

Context: code obfuscation is intended to obstruct code understanding and, eventually, to delay malicious code changes and ultimately render it uneconomical. Although code understanding cannot be completely impeded, code obfuscation makes it more laborious and troublesome, so as to discourage or retard code tampering. Despite the extensive adoption of obfuscation, its assessment has been addressed indirectly either by using internal metrics or taking the point of view of code analysis, e.g., considering the associated computational complexity. To the best of our knowledge, there is no publicly available user study that measures the cost of understanding obfuscated code from the point of view of a human attacker. Aim: this paper experimentally assesses the impact of code obfuscation on the capability of human subjects to understand and change source code. In particular, it considers code protected with two well-known code obfuscation techniques, i.e., identifier renaming and opaque predicates. Method: We have conducted a family of five controlled experiments, involving undergraduate and graduate students from four Universities. During the experiments, subjects had to perform comprehension or attack tasks on decompiled clients of two Java network-based applications, either obfuscated using one of the two techniques, or not. To assess and compare the obfuscation techniques, we measured the correctness and the efficiency of the performed task. Results: —at least for the tasks we considered—simpler techniques (i.e., identifier renaming) prove to be more effective than more complex ones (i.e., opaque predicates) in impeding subjects to complete attack tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

A large study on the effect of code obfuscation on the quality of java code

Article 02 July 2014

Mariano Ceccato, Andrea Capiluppi, … Cornelia Boldyreff

Strategies of Naive Software Reverse Engineering: A Qualitative Analysis

Empirical assessment of the effort needed to attack programs protected with client/server code splitting

Article 25 July 2019

Alessio Viticchié, Leonardo Regano, … Paolo Tonella

Notes

http://sandmark.cs.arizona.edu/
http://www.kpdus.com/jad.html
As already mentioned in Section 2, we restrict to decompilable opaque predicates.
CarRace was developed by one of the authors as case study application for a previous work (Ceccato et al. 2007).
ChatClient is an open source project available at http://sourceforge.net/projects/jchat.
Subjects used decompiled code rather than source code because, in a realistic attack, they cannot access the source code, but they can decompile the binary or the bytecode.
http://selab.fbk.eu/ceccato/replication_packages/id_renaming_vs_opaque_predicates_package.tgz
The goal of feature location (Eisenbarth et al. 2003) is to identify the computational units (e.g., procedures, class methods) that specifically implement a feature (e.g., requirement) of interest.
http://selab.fbk.eu/ceccato/replication_packages/id_renaming_vs_opaque_predicates_package.tgz

References

Anckaert B, Madou M, Sutter BD, Bus BD, Bosschere KD, Preneel B (2007) Program obfuscation: a quantitative approach. In: QoP ’07: Proc. of the 2007 ACM workshop on quality of protection. ACM, New York, NY, USA, pp 15–20. doi:10.1145/1314257.1314263
Chapter Google Scholar
Baker RD (1995) Modern permutation test software. In: Edgington E (ed) Randomization tests. Marcel Decker
Ceccato M, Di Penta M, Nagra J, Falcarin P, Ricca F, Torchiano M, Tonella P (2009a) The effectiveness of source code obfuscation: an experimental assessment. In: IEEE 17th international conference on program comprehension (ICPC), pp 178–187. doi:10.1109/ICPC.2009.5090041
Ceccato M, Di Penta M, Nagra J, Falcarin P, Ricca F, Torchiano M, Tonella P (2009b) The effectiveness of source code obfuscation: an experimental assessment. Tech. rep., University of Sannio. URL http://www.rcost.unisannio.it/mdipenta/icpc09-tr.pdf
Ceccato M, Preda MD, Nagra J, Collberg C, Tonella P (2007) Barrier slicing for remote software trusting. In: Proc. of the 7th IEEE international working conference on source code analysis and manipulation (SCAM 2007). IEEE Computer Society, pp 27–36. (Sept. 30 2007–Oct. 1 2007). doi:10.1109/SCAM.2007.4362895
Chang H, Atallah M (2002) Protecting software code by guards. In: ACM workshop on security and privacy in digital rights management. ACM
Cohen J (1988) Statistical power analysis for the behavioral sciences, 2nd edn. Lawrence Earlbaum Associates, Hillsdale, NJ
MATH Google Scholar
Collberg C, Nagra J (2009) Surreptitious software: obfuscation, watermarking, and tamperproofing for software protection, 1st edn. Addison-Wesley Professional
Collberg C, Thomborson C, Low D (1997) A taxonomy of obfuscating transformations. Technical Report 148, Dept. of Computer Science, The Univ. of Auckland
Collberg C, Thomborson C, Low D (1998) Manufacturing cheap, resilient, and stealthy opaque constructs. In: POPL ’98: Proceedings of the 25th ACM SIGPLAN-SIGACT symposium on principles of programming languages. ACM, New York, NY, USA, pp 184–196. doi:10.1145/268946.268962
Chapter Google Scholar
Cordy J (2006) The TXL source transformation language. Sci Comput Program 61(3):190–210
Article MATH MathSciNet Google Scholar
Devore JL (2007) Probability and statistics for engineering and the sciences, 7th edn. Duxbury Press
Eisenbarth T, Koschke R, Simon D (2003) Locating features in source code. IEEE Trans Softw Eng 29(3):195–209
Article Google Scholar
Falcarin P, Collberg C, Atallah M, Jakubowski M (2011) Guest editors’ introduction: software protection. IEEE Softw 28(2):24–27
Article Google Scholar
Falcarin P, Scandariato R, Baldi M (2006) Remote trust with aspect oriented programming. In: IEEE advanced information and networking applications (AINA-06). IEEE
Fiutem R, Tonella P, Antoniol G, Merlo E (1999) Points-to analysis for program understanding. J Syst Softw 44(3):213–227
Article Google Scholar
Goto H, Mambo M, Matsumura K, Shizuya H (2000) An approach to the objective and quantitative evaluation of tamper-resistant software. In: 3rd int. workshop on information security (ISW2000). Springer, pp 82–96
Grissom RJ, Kim JJ (2005) Effect sizes for research: a broad practical approach, 2nd edn. Lawrence Earlbaum Associates
Horne B, Matheson L, Sheehan C, Tarjan RE (2001) Dynamic self-checking techniques for improved tamper resistance. In: ACM workshop on security and privacy in digital rights management. ACM
Iversen G, Norpoth H (1987) Analysis of variance, 2nd edn. Sage Publications
Juristo N, Moreno A (2001) Basics of software engineering experimentation. Kluwer Academic Publishers, Englewood Cliffs, NJ
Book MATH Google Scholar
Motulsky H (2010) Intuitive biostatistics: a nonmathematical guide to statistical thinking. Oxford University Press. http://books.google.it/books?id=R477U5bAZs4C
Oppenheim AN (1992) Questionnaire design, interviewing and attitude measurement. Pinter, London
Google Scholar
R Core Team (2012) R: a language and environment for statistical computing. Vienna, Austria. http://www.R-project.org. ISBN 3-900051-07-0
Ricca F, Di Penta M, Torchiano M, Tonella P, Ceccato M (2010) How developers’ experience and ability influence web application comprehension tasks supported by UML stereotypes: a series of four experiments. IEEE Trans Softw Eng 36:96–118. doi:10.1109/TSE.2009.69
Article Google Scholar
Ricca F, Di Penta M, Torchiano M, Tonella P, Ceccato M, Visaggio CA (2008) Are fit tables really talking?: a series of experiments to understand whether fit tables are useful during evolution tasks. In: 30th International Conference on Software Engineering (ICSE 2008), pp 361–370
Ricca F, Torchiano M, Di Penta M, Ceccato M, Tonella P (2009) Using acceptance tests as a support for clarifying requirements: a series of experiments. Inf Softw Technol 51:270–283
Article Google Scholar
Scandariato R, Ofek Y, Falcarin P, Baldi M (2008) Application-oriented trust in distributed computing. In: 3rd international conference on availability, reliability and security, ARES 08. IEEE, pp 434–439
Sheskin D (2007) Handbook of parametric and nonparametric statistical procedures, 4th edn. Chapman & All
Sutherland I, Kalb GE, Blyth A, Mulley G (2006) An empirical examination of the reverse engineering process for binary files. Comput Secur 25(3):221–228
Article Google Scholar
Tyma P (2000) Method for renaming identifiers of a computer program. US Patent 6,102,966
Udupa S, Debray S, Madou M (2005) Deobfuscation: reverse engineering obfuscated code. In: 12th working conference on reverse engineering. doi:10.1109/WCRE.2005.13
Wohlin C, Runeson P, Höst M, Ohlsson M, Regnell B, Wesslén A (2000) Experimentation in software engineering—an introduction. Kluwer Academic Publishers

Download references

Author information

Authors and Affiliations

Cit, Fondazione Bruno Kessler, Trento, Italy
Mariano Ceccato & Paolo Tonella
Department of Engineering, University of Sannio, Benevento, Italy
Massimiliano Di Penta
School of Architecture, Computing and Engineering, University of East London, London, UK
Paolo Falcarin
DIBRIS, University of Genova, Genova, Italy
Filippo Ricca
Politecnico di Torino, Torino, Italy
Marco Torchiano

Authors

Mariano Ceccato
View author publications
You can also search for this author in PubMed Google Scholar
Massimiliano Di Penta
View author publications
You can also search for this author in PubMed Google Scholar
Paolo Falcarin
View author publications
You can also search for this author in PubMed Google Scholar
Filippo Ricca
View author publications
You can also search for this author in PubMed Google Scholar
Marco Torchiano
View author publications
You can also search for this author in PubMed Google Scholar
Paolo Tonella
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mariano Ceccato.

Additional information

Communicated by: Martin Robillard

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ceccato, M., Di Penta, M., Falcarin, P. et al. A family of experiments to assess the effectiveness and efficiency of source code obfuscation techniques. Empir Software Eng 19, 1040–1074 (2014). https://doi.org/10.1007/s10664-013-9248-x

Download citation

Published: 23 February 2013
Issue Date: August 2014
DOI: https://doi.org/10.1007/s10664-013-9248-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

A family of experiments to assess the effectiveness and efficiency of source code obfuscation techniques

Abstract

Access this article

Similar content being viewed by others

A large study on the effect of code obfuscation on the quality of java code

Strategies of Naive Software Reverse Engineering: A Qualitative Analysis

Empirical assessment of the effort needed to attack programs protected with client/server code splitting

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A family of experiments to assess the effectiveness and efficiency of source code obfuscation techniques

Abstract

Access this article

Similar content being viewed by others

A large study on the effect of code obfuscation on the quality of java code

Strategies of Naive Software Reverse Engineering: A Qualitative Analysis

Empirical assessment of the effort needed to attack programs protected with client/server code splitting

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation