Skip to main content
Log in

Treant: training evasion-aware decision trees

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

Despite its success and popularity, machine learning is now recognized as vulnerable to evasion attacks, i.e., carefully crafted perturbations of test inputs designed to force prediction errors. In this paper we focus on evasion attacks against decision tree ensembles, which are among the most successful predictive models for dealing with non-perceptual problems. Even though they are powerful and interpretable, decision tree ensembles have received only limited attention by the security and machine learning communities so far, leading to a sub-optimal state of the art for adversarial learning techniques. We thus propose Treant, a novel decision tree learning algorithm that, on the basis of a formal threat model, minimizes an evasion-aware loss function at each step of the tree construction. Treant is based on two key technical ingredients: robust splitting and attack invariance, which jointly guarantee the soundness of the learning process. Experimental results on publicly available datasets show that Treant is able to generate decision tree ensembles that are at the same time accurate and nearly insensitive to evasion attacks, outperforming state-of-the-art adversarial learning techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. The name comes from the role playing game “Dungeons & Dragons”, where it identifies giant tree-like creatures.

  2. For simplicity, we only consider numerical features over \(\mathbb {R}\). However, our framework can be readily generalized to other use cases, e.g., categorical or ordinal features, which we support in our implementation and experiments.

  3. \(\mathcal {L} (t,\mathcal {D}) = (-2+1)^2+(-1+1)^2+(-1-0)^2+4\cdot (2-2)^2=2\).

  4. \(\mathcal {L} ^A(t, \mathcal {D}) = (-2+1)^2+(-1+1)^2+(2-0)^2+4\cdot (2-2)^2=5\).

  5. \(\mathcal {L} ^A(t,\mathcal {D}) = (-2+1.5)^2+(-1+1.5)^2+(0-1.6)^2+4\cdot (2-1.6)^2=3.7\).

  6. \(\mathcal {L} ^A(t,\mathcal {D}) = (-2+1.5)^2+(-1+1.5)^2+(0-1.5)^2+(2-1)^2+3\cdot (2-2)^2=3.75\).

  7. \(\mathcal {L} ^A(t,\mathcal {D}) = (-2+1.5)^2+(-1+1.5)^2+(0-1.5)^2+(2-1.5)^2+3\cdot (2-2)^2=3\).

  8. The pointwise maximum and the sum of convex functions preserve convexity.

  9. We use the symbol \(\lessgtr \) to stand for either \(\le \) or \(\ge \) when the distinction is unimportant.

  10. https://github.com/microsoft/LightGBM.

  11. https://archive.ics.uci.edu/ml/datasets/census+income.

  12. https://www.kaggle.com/c/uci-wine-quality-dataset/data.

  13. https://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients.

  14. https://ieee-dataport.org/open-access/malware-analysis-datasets-top-1000-pe-imports.

  15. The wine dataset was originally conceived for a multiclass classification problem; we turned that into a binary one, where the positive class identifies good-quality wines (i.e., those whose quality is at least 6, on a 0–10 scale) and the negative class contains the remaining instances.

References

  • Biggio B, Roli F (2018) Wild patterns: ten years after the rise of adversarial machine learning. Pattern Recognit 84:317–331

    Article  Google Scholar 

  • Biggio B, Fumera G, Roli F (2010) Multiple classifier systems for robust classifier design in adversarial environments. Int J Mach Learn Cybern 1(1–4):27–41

    Article  Google Scholar 

  • Biggio B, Nelson B, Laskov P (2011) Support vector machines under adversarial label noise. In: ACML Asian conference on machine learning, pp 97–112

  • Biggio B, Corona I, Maiorca D, Nelson B, Srndic N, Laskov P, Giacinto G, Roli F (2013) Evasion attacks against machine learning at test time. In: Machine learning and knowledge discovery in databases—European conference, ECML PKDD, pp 387–402

  • Biggio B, Fumera G, Roli F (2014) Security evaluation of pattern classifiers under attack. IEEE Trans Knowl Data Eng 26(4):984–996

    Article  Google Scholar 

  • Breiman L (2001) Random forests. Mach Learn 45(1):5–32

    Article  Google Scholar 

  • Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Wadsworth. ISBN 0-534-98053-8

  • Calzavara S, Lucchese C, Tolomei G (2019) Adversarial training of gradient-boosted decision trees. In: Zhu W, Tao D, Cheng X, Cui P, Runden-steiner EA, Carmel D, He Q, Yu JX (eds), Proceedings of the 28th ACM international conference on information and knowledge management, CIKM 2019, Beijing, China, November 3–7, 2019, pp 2429–2432. ACM

  • Carlini N, Wagner DA (2017) Towards evaluating the robustness of neural networks. In: IEEE symposium on security and privacy S&P, pp 39–57

  • Chen H, Zhang H, Boning DS, Hsieh C (2019) Robust decision trees against adversarial examples. In: International conference on machine learning ICML, pp 1122–1131

  • Chollet F (2017) Deep learning with python, 1st edn. Manning Publications Co., Greenwich, CT, USA. ISBN 1617294438, 9781617294433

  • Dang H, Huang Y, Chang E (2017) Evading classifiers by morphing in the dark. In: ACM SIGSAC conference on computer and communications security, pp 119–133

  • Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29:1189–1232

    Article  MathSciNet  Google Scholar 

  • Goodfellow IJ, Shlens J, Szegedy C (2015) Explaining and harnessing adversarial examples. In: International conference on learning representations ICLR

  • Gu S, Rigazio L (2015) Towards deep neural network architectures robust to adversarial examples. In: International conference on learning representations ICLR, Workshop Track Proceedings

  • He W, Wei J, Chen X, Carlini N, Song D (2017) Adversarial example defense: ensembles of weak defenses are not strong. In: USENIX workshop on offensive technologies WOOT

  • Hershkop S, Stolfo SJ (2005) Combining email models for false positive reduction. In: ACM SIGKDD international conference on knowledge discovery in data mining, pp 98–107. https://doi.org/10.1145/1081870.1081885

  • Huang L, Joseph AD, Nelson B, Rubinstein BIP, Tygar JD (2011) Adversarial machine learning. In: ACM workshop on security and artificial intelligence AISec, pp 43–58

  • Hunt EB, Marin J, Stone PJ (1966) Experiments in Induction. Academic Press, New York

    Google Scholar 

  • Hyafil L, Rivest RL (1976) Constructing optimal binary decision trees is np-complete. Inf Process Lett 5(1):15–17

    Article  MathSciNet  Google Scholar 

  • Kantchelian A, Tygar JD, Joseph AD (2016) Evasion and hardening of tree ensemble classifiers. In: International conference on machine learning ICML, pp 2387–2396

  • Kraft D (1994) Algorithm 733: Tomp-fortran modules for optimal control calculations. ACM Trans Math Softw TOMS 20(3):262–281. https://doi.org/10.1145/192115.192124 ISSN 0098-3500

  • Lash MT, Lin Q, Street WN, Robinson JG (2017) A budget-constrained inverse classification framework for smooth classifiers. In: IEEE international conference on data mining workshops ICDMW, pp 1184–1193

  • Lowd D, Meek C (2005) Adversarial learning. In: ACM SIGKDD international conference on knowledge discovery in data mining, pp 641–647

  • Madry A, Makelov A, Schmidt L, Tsipras D, Vladu A (2018) Towards deep learning models resistant to adversarial attacks. In: International conference on learning representations ICLR

  • Mehta M, Agrawal R, Rissanen J (1996) Sliq: a fast scalable classifier for data mining. In: International conference on extending database technology, pp 18–32. Springer

  • Moosavi-Dezfooli S, Fawzi A, Frossard P (2016) Deepfool: a simple and accurate method to fool deep neural networks. In: IEEE conference on computer vision and pattern recognition CVPR, pp 2574–2582

  • Murthy SK (1998) Automatic construction of decision trees from data: a multi-disciplinary survey. Data Min Knowl Discov 2(4):345–389

    Article  Google Scholar 

  • Nawar S, Mouazen A (2017) Comparison between random forests, artificial neural networks and gradient boosted machines methods of on-line vis-nir spectroscopy measurements of soil total nitrogen and total carbon. Sensors 17(10):2428

    Article  Google Scholar 

  • Nelson B, Rubinstein BIP, Huang L, Joseph AD, Lau S, Lee SJ, Rao S, Tran A, Tygar JD (2010) Near-optimal evasion of convex-inducing classifiers. In: International conference on artificial intelligence and statistics AISTATS, pp 549–556

  • Nguyen AM, Yosinski J, Clune J (2015) Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In: IEEE conference on computer vision and pattern recognition CVPR, pp 427–436

  • Papernot N, McDaniel PD, Jha S, Fredrikson M, Celik ZB, Swami A (2016a) The limitations of deep learning in adversarial settings. In IEEE European symposium on security and privacy EuroS&P, pp 372–387

  • Papernot N, McDaniel PD, Wu X, Jha S, Swami A (2016b) Distillation as a defense to adversarial perturbations against deep neural networks. In IEEE symposium on security and privacy S&P, pp 582–597

  • Perdisci R, Gu G, Lee W (2006) Using an ensemble of one-class SVM classifiers to harden payload-based anomaly detection systems. In: IEEE international conference on data mining ICDM, pp 488–498. https://doi.org/10.1109/ICDM.2006.165

  • Quinlan JR (1986) Induction of decision trees. Mach Learn 1(1):81–106

    Google Scholar 

  • Srndic N, Laskov P (2014) Practical evasion of a learning-based classifier: a case study. In: IEEE symposium on security and privacy S&P, pp 197–211

  • Szegedy C, Zaremba W, Sutskever I, Bruna J, Erhan D, Goodfellow IJ, Fergus R (2014) Intriguing properties of neural networks. In: International conference on learning representations ICLR

  • Tolomei G, Silvestri F, Haines A, Lalmas M (2017) Interpretable predictions of tree-based ensembles via actionable feature tweaking. In: ACM SIGKDD international conference on knowledge discovery in data mining, pp 465–474

  • Tran TP, Tsai P, Jan T (2008) An adjustable combination of linear regression and modified probabilistic neural network for anti-spam filtering. In: International conference on pattern recognition ICPR, pp 1–4. https://doi.org/10.1109/ICPR.2008.4761358

  • Tyler DE (2008) Robust Statistics: Theory and Methods. J Am Stat Assoc 103(482):888–889. https://doi.org/10.1198/jasa.2008.s239

    Article  Google Scholar 

  • Vapnik V (1992) Principles of Risk Minimization for Learning Theory. Adv Neural Inf Process Syst 4:831–838

    Google Scholar 

  • Xiao H, Biggio B, Nelson B, Xiao H, Eckert C, Roli F (2015) Support vector machines under adversarial label contamination. Neurocomputing 160:53–62

    Article  Google Scholar 

  • Zhang F, Wang Y, Liu S, Wang H (2020) Decision-based evasion attacks on tree ensemble classifiers. World Wide Web, pp 1–21

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Claudio Lucchese.

Additional information

Responsible editor: Ira Assent, Carlotta Domeniconi, Aristides Gionis, Eyke Hüllermeier.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Calzavara, S., Lucchese, C., Tolomei, G. et al. Treant: training evasion-aware decision trees. Data Min Knowl Disc 34, 1390–1420 (2020). https://doi.org/10.1007/s10618-020-00694-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-020-00694-9

Keywords

Navigation