skip to main content
research-article

A Supervised Learning Approach to Protect Client Authentication on the Web

Authors Info & Claims
Published:12 June 2015Publication History
Skip Abstract Section

Abstract

Browser-based defenses have recently been advocated as an effective mechanism to protect potentially insecure web applications against the threats of session hijacking, fixation, and related attacks. In existing approaches, all such defenses ultimately rely on client-side heuristics to automatically detect cookies containing session information, to then protect them against theft or otherwise unintended use. While clearly crucial to the effectiveness of the resulting defense mechanisms, these heuristics have not, as yet, undergone any rigorous assessment of their adequacy. In this article, we conduct the first such formal assessment, based on a ground truth of 2,464 cookies we collect from 215 popular websites of the Alexa ranking.

To obtain the ground truth, we devise a semiautomatic procedure that draws on the novel notion of authentication token, which we introduce to capture multiple web authentication schemes. We test existing browser-based defenses in the literature against our ground truth, unveiling several pitfalls both in the heuristics adopted and in the methods used to assess them. We then propose a new detection method based on supervised learning, where our ground truth is used to train a set of binary classifiers, and report on experimental evidence that our method outperforms existing proposals. Interestingly, the resulting classifiers, together with our hands-on experience in the construction of the ground truth, provide new insight on how web authentication is actually implemented in practice.

References

  1. Rakesh Agrawal and Ramakrishnan Srikant. 1994. Fast algorithms for mining association rules. In International Conference on Very Large Data Bases (VLDB’94). 487--499. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Kellie J. Archer and Ryan V. Kimes. 2008. Empirical characterization of random forest variable importance measures. Computational Statistics & Data Analysis 52, 4, 2249--2260. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. L. Breiman. 2001. Random forests. Machine Learning 45, 5--32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Carla E. Brodley and Paul E. Utgoff. 1995. Multivariate decision trees. Machine Learning 19, 1, 45--77. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Michele Bugliesi, Stefano Calzavara, Riccardo Focardi, and Wilayat Khan. 2014a. Automatic and robust client-side protection for cookie-based sessions. In Engineering Secure Software and Systems (ESSoS’14). 161--178. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Michele Bugliesi, Stefano Calzavara, Riccardo Focardi, Wilayat Khan, and Mauro Tempesta. 2014b. Provably sound browser-based enforcement of web session integrity. In IEEE Computer Security Foundations Symposium (CSF’14). Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Stefano Calzavara, Gabriele Tolomei, Michele Bugliesi, and Salvatore Orlando. 2014. Quite a mess in my cookie jar!: Leveraging machine learning to protect web authentication. In 23rd International World Wide Web Conference (WWW’14). 189--200. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Nitesh V. Chawla. 2005. Data mining for imbalanced datasets: An overview. In Data Mining and Knowledge Discovery Handbook. Springer, 853--867.Google ScholarGoogle Scholar
  9. Italo Dacosta, Saurabh Chakradeo, Mustaque Ahamad, and Patrick Traynor. 2012. One-time cookies: Preventing session hijacking attacks with stateless authentication tokens. ACM Transactions on Internet Technology 12, 1, 1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Philippe De Ryck, Lieven Desmet, Thomas Heyman, Frank Piessens, and Wouter Joosen. 2010. CsFire: Transparent client-side mitigation of malicious cross-domain requests. In Engineering Secure Software and Systems (ESSoS’10). 18--34. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Philippe De Ryck, Lieven Desmet, Wouter Joosen, and Frank Piessens. 2011. Automatic and precise client-side protection against CSRF attacks. In European Symposium on Research in Computer Security (ESORICS’11). 100--116. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Philippe De Ryck, Nick Nikiforakis, Lieven Desmet, Frank Piessens, and Wouter Joosen. 2012. Serene: Self-reliant client-side protection against session fixation. In Distributed Applications and Interoperable Systems (DAIS’12). 59--72. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. P. A. Devyver and J. Kittler. 1982. Pattern Recognition: A Statistical Approach. Prentice-Hall.Google ScholarGoogle Scholar
  14. Michael Dietz, Alexei Czeskis, Dirk Balfanz, and Dan S. Wallach. 2012. Origin-bound certificates: A fresh approach to strong client authentication for the web. In 21th USENIX Security Symposium. 317--331. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Charles Elkan. 2001. The foundations of cost-sensitive learning. In 17th International Joint Conference on Artificial Intelligence. 973--978. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Dinei A. F. Florêncio and Cormac Herley. 2007. A large-scale study of web password habits. In International Conference on World Wide Web (WWW’07). 657--666. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Seth Fogie, Jeremiah Grossman, Robert Hansen, Anton Rager, and Petko D. Petkov. 2007. XSS Attacks: Cross Site Scripting Exploits and Defense. Syngress Publishing. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. William F. Friedman. 1922. The Index of Coincidence and Its Applications to Cryptanalysis. Cryptographic Series.Google ScholarGoogle Scholar
  19. Kevin Fu, Emil Sit, Kendra Smith, and Nick Feamster. 2001. The Dos and Don’ts of client authentication on the web. In 10th USENIX Security Symposium. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Stuart Geman, Elie Bienenstock, and René Doursat. 1992. Neural networks and the bias/variance dilemma. Neural Computation 4, 1 (Jan. 1992), 1--58. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Thiago S. Guzella and Walmir M. Caminhas. 2009. A review of machine learning approaches to Spam filtering. Expert Systems with Applications 36, 7, 10206--10222. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Jeff Hodges, Collin Jackson, and Adam Barth. 2012. HTTP Strict Transport Security. Retrieved from http://tools.ietf.org/html/rfc6797.Google ScholarGoogle Scholar
  23. Collin Jackson and Adam Barth. 2008. ForceHTTPS: Protecting high-security web sites from network attacks. In International Conference on World Wide Web (WWW’08). 525--534. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2014. An Introduction to Statistical Learning: With Applications in R. Springer. Google ScholarGoogle Scholar
  25. Nathalie Japkowicz and Shaju Stephen. 2002. The class imbalance problem: A systematic study. Intelligent Data Analysis 6, 5, 429--449. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Martin Johns, Bastian Braun, Michael Schrank, and Joachim Posegga. 2011. Reliable protection against session fixation attacks. In ACM Symposium on Applied Computing (SAC’11). 1531--1537. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Martin Johns and Justus Winter. 2006. RequestRodeo: Client side protection against session riding. In OWASP Europe Conference. 5--17.Google ScholarGoogle Scholar
  28. Michal Kranch and Joseph Bonneau. 2015. Upgrading HTTPS in mid-air: An empirical study of strict transport security and key pinning. In Network and Distributed System Symposium (NDSS’15).Google ScholarGoogle ScholarCross RefCross Ref
  29. E. Kreyszig. 1979. Advanced Engineering Mathematics (4 ed.). Wiley.Google ScholarGoogle Scholar
  30. Thomas M. Mitchell. 1997. Machine Learning (1 ed.). McGraw-Hill, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Nick Nikiforakis, Wannes Meert, Yves Younan, Martin Johns, and Wouter Joosen. 2011. SessionShield: Lightweight protection against session hijacking. In Engineering Secure Software and Systems (ESSoS’11). 87--100. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Claudia Perlich, Foster Provost, and Jeffrey S. Simonoff. 2003. Tree induction vs. logistic regression: A learning-curve analysis. Journal of Machine Learning Research 4 (Dec. 2003), 211--255. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. M. M. Rahman and D. N. Davis. 2013. Addressing the class imbalance problem in medical datasets. International Journal of Machine Learning and Computing 3, 3, 224--228.Google ScholarGoogle ScholarCross RefCross Ref
  34. Franziska Roesner, Tadayoshi Kohno, and David Wetherall. 2012. Detecting and defending against third-party tracking on the web. In USENIX Conference on Networked Systems Design and Implementation (NSDI’12). 1--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. G. Salton and M. J. McGill. 1986. Introduction to Modern Information Retrieval. McGraw-Hill, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Claude Shannon. 1948. A mathematical theory of communication. The Bell System Technical Journal 27, 379--423.Google ScholarGoogle ScholarCross RefCross Ref
  37. Robin Sommer and Vern Paxson. 2010. Outside the closed world: On using machine learning for network intrusion detection. In IEEE Symposium on Security and Privacy. 305--316. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Shuo Tang, Nathan Dautenhahn, and Samuel T. King. 2011. Fortifying web-based applications automatically. In ACM Conference on Computer and Communications Security (CCS’11). 615--626. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Gary M. Weiss, Kate McCarthy, and Bibi Zabar. 2007. Cost-Sensitive Learning vs. Sampling: Which is Best for Handling Unbalanced Classes with Unequal Error Costs? Retrieved from http://storm.cis.fordham.edu/gweiss/papers/dmin07-weiss.pdf.Google ScholarGoogle Scholar
  40. Yuchen Zhou and David Evans. 2010. Why aren’t HTTP-Only cookies more widely deployed. In Web 2.0 Security and Privacy Workshop (W2SP’10).Google ScholarGoogle Scholar

Index Terms

  1. A Supervised Learning Approach to Protect Client Authentication on the Web

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on the Web
          ACM Transactions on the Web  Volume 9, Issue 3
          June 2015
          187 pages
          ISSN:1559-1131
          EISSN:1559-114X
          DOI:10.1145/2788341
          Issue’s Table of Contents

          Copyright © 2015 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 12 June 2015
          • Revised: 1 March 2015
          • Accepted: 1 March 2015
          • Received: 1 October 2014
          Published in tweb Volume 9, Issue 3

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader