Skip to main content
Log in

Sentence Embedding Models for Similarity Detection of Software Requirements

  • Original Research
  • Published:
SN Computer Science Aims and scope Submit manuscript

Abstract

Semantic similarity detection mainly relies on the availability of laboriously curated ontologies, as well as of supervised and unsupervised neural embedding models. In this paper, we present two domain-specific sentence embedding models trained on a natural language requirements dataset in order to derive sentence embeddings specific to the software requirements engineering domain. We use cosine-similarity measures in both these models. The result of the experimental evaluation confirm that the proposed models enhance the performance of textual semantic similarity measures over existing state-of-the-art neural sentence embedding models: we reach an accuracy of 88.35%—which improves by about 10% on existing benchmarks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. https://github.com/hanxiao/bert-as-service/.

References

  1. Abad ZSH, Karras O, Ghazi P, Glinz M, Ruhe G, Schneider K. What works better? A study of classifying requirements. In: 2017 IEEE 25th International Requirements Engineering Conference (RE), IEEE; 2017; p. 496–501.

  2. Arora S, Liang Y, Ma T. A simple but tough-to-beat baseline for sentence embeddings. In: International Conference on learning representations; 2016; p. 1–16.

  3. Barz B, Denzler J. Deep learning on small datasets without pre-training using cosine loss. In: The IEEE Winter Conference on applications of computer vision, 2020; p. 1371–380.

  4. Biswas E, Vijay-Shanker K, Pollock L. Exploring word embedding techniques to improve sentiment analysis of software engineering texts. In: 2019 IEEE/ACM 16th International Conference on mining software repositories (MSR), IEEE, 2019; p. 68–78.

  5. Bowman SR, Angeli G, Potts C, Manning CD. A large annotated corpus for learning natural language inference. In: Proceedings of the 2015 Conference on empirical methods in natural language processing, association for computational linguistics, Lisbon, Portugal, 2015. ;. 632–42, https://doi.org/10.18653/v1/D15-1075.

  6. Cer D, Diab M, Agirre E, Lopez-Gazpio I, Specia L. Semeval-2017 task 1: semantic textual similarity-multilingual and cross-lingual focused evaluation. In: Association for Computational Linguistics, 2017; p. 1–14. https://www.aclweb.org/anthology/S17-2001, arXiv preprint arXiv:1708.00055.

  7. Cer D, Yang Y, Kong Sy, Hua N, Limtiaco N, St John R, Constant N, Guajardo-Cespedes M, Yuan S, Tar C, Strope B, Kurzweil R. Universal sentence encoder for Eenglish. In: Proceedings of the 2018 Conference on empirical methods in natural language processing: system demonstrations, association for computational linguistics, Brussels, Belgium, 2018; p. 169–74, https://doi.org/10.18653/v1/D18-2029

  8. Conneau A, Kiela D, Schwenk H, Barrault L, Bordes A. Supervised learning of universal sentence representations from natural language inference data. In: Proceedings of the 2017 Conference on empirical methods in natural language processing, association for computational linguistics, Copenhagen, Denmark, 2017; p. 670–80, https://doi.org/10.18653/v1/D17-1070

  9. Dalpiaz F, Ferrari A, Franch X, Palomares C. Natural language processing for requirements engineering: the best is yet to come. In: IEEE software, IEEE, 2018;35:115–19.

  10. Devlin J, Chang MW, Lee K, Toutanova K. BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Association for Computational Linguistics, Minneapolis, Minnesota, 2019; p. 4171–186, https://doi.org/10.18653/v1/N19-1423

  11. Efstathiou V, Chatzilenas C, Spinellis D. Word embeddings for the software engineering domain. In: Proceedings of the 15th International Conference on mining software repositories, 2018; p. 38–41.

  12. Eyal Salman H, Hammad M, Seriai AD, Al-Sbou A. Semantic clustering of functional requirements using agglomerative hierarchical clustering. In: Information, Multidisciplinary Digital Publishing Institute, 2018;9: 222.

  13. Ferrari A, Spagnolo GO, Gnesi S. Pure: a dataset of public requirements documents. In: 2017 IEEE 25th International Requirements Engineering Conference (RE), IEEE, 2017; p. 502–5.

  14. Howard J, Ruder S. Universal language model fine-tuning for text classification. In: Proceedings of the 56th Annual Meeting of the association for computational linguistics (Volume 1: Long Papers), association for computational linguistics, Melbourne, Australia, 2018; p. 328–39, https://doi.org/10.18653/v1/P18-1031

  15. Ilyas M, Kung J. A similarity measurement framework for requirements engineering. In: 2009 Fourth International Multi-Conference on computing in the global information technology, IEEE, 2009; p. 31–4.

  16. Joulin A, Grave E, Bojanowski P, Mikolov T. Bag of tricks for efficient text classification. In: Proceedings of the 15th Conference of the European Chapter of the association for computational linguistics: volume 2, short papers, association for computational linguistics, Valencia, Spain, 2017;.p. 427–31.

  17. Kingma DP, Ba J. Adam: a method for stochastic optimization. In: 3rd International Conference on learning representations, ICLR 2015, San Diego, CA, USA, May 7–9, 2015, Conference Track Proceedings; 2015.

  18. Kiros R, Zhu Y, Salakhutdinov RR, Zemel R, Urtasun R, Torralba A, Fidler S. Skip-thought vectors. In: Advances in neural information processing systems, 2015’ p. 3294–302.

  19. Lan Z, Chen M, Goodman S, Gimpel K, Sharma P, Soricut R. Albert: a lite bert for self-supervised learning of language representations. In: arXiv preprint arXiv:1909.11942 2019.

  20. Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V. Roberta: A robustly optimized Bert pretraining approach. In: arXiv preprint arXiv:1907.11692 2019.

  21. May C, Wang A, Bordia S, Bowman SR, Rudinger R. On measuring social biases in sentence encoders. In: Proceedings of the 2019 Conference of the North American Chapter of the association for computational linguistics: human language technologies, volume 1 (Long and Short Papers), association for computational linguistics, Minneapolis, Minnesota, 2019; p. 622–28, https://doi.org/10.18653/v1/N19-1063

  22. Mihany FA, Moussa H, Kamel A, Ezzat E, Ilyas M. An automated system for measuring similarity between software requirements. In: Proceedings of the 2nd Africa and Middle East Conference on software engineering, 2016; p. 46–51.

  23. Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J. Distributed representations of words and phrases and their compositionality. In: NIPS'13: Proceedings of the 26th International Conference on Neural Information Processing Systemss, Vol. 2. Red Hook, NY: Curran Associates Inc.; 2013. p. 3111–9.

  24. Mishra S, Sharma A. On the use of word embeddings for identifying domain specific ambiguities in requirements. In: 2019 IEEE 27th International Requirements Engineering Conference Workshops (REW), IEEE, 2019; p. 234–40.

  25. Ott D. Automatic requirement categorization of large natural language specifications at mercedes-benz for review improvements. In: International Working Conference on requirements engineering: foundation for software quality, Springer, 2013; p. 50–64.

  26. Pennington J, Socher R, Manning CD. Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on empirical methods in natural language processing (EMNLP), 2014; p. 1532–543.

  27. Peters M, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L. Deep contextualized word representations. In: Proceedings of the 2018 Conference of the North American Chapter of the association for computational linguistics: human language technologies, volume 1 (Long Papers), association for computational linguistics, New Orleans, Louisiana, 2018; p. 2227–237, https://doi.org/10.18653/v1/N18-1202

  28. Quan Z, Wang Z, Le Y, Yao B, Li K, Yin J. An efficient framework for sentence similarity modeling. IEEE/ACM Trans Audio Speech Lang Process. 2019;27:853–65.

    Article  Google Scholar 

  29. Rahimi M, Mirakhorli M, Cleland-Huang J. Automated extraction and visualization of quality concerns from requirements specifications. In: 2014 IEEE 22nd International Requirements Engineering Conference (RE), IEEE, 2014; p. 253–62.

  30. Reimers N, Gurevych I. Sentence-BERT: sentence embeddings using Siamese BERT-networks. In: Proceedings of the 2019 Conference on empirical methods in natural language processing and the 9th International Joint Conference on natural language processing (EMNLP-IJCNLP), association for computational linguistics, Hong Kong, China, 2019; p. 3982–992, https://doi.org/10.18653/v1/D19-1410

  31. Shirabad JS, Menzies TJ. The promise repository of software engineering databases. In: School of information technology and engineering, University of Ottawa, Canada, 2005; vol 24.

  32. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I. Attention is all you need. In: NIPS 2017: 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA. Red Hook, NY: Curran Associates Inc. 2017; p. 5998–6008.

  33. Winkler J, Vogelsang A. Automatic classification of requirements based on convolutional neural networks. In: 2016 IEEE 24th International Requirements Engineering Conference Workshops (REW), IEEE, 2016; p. 39–45.

  34. Yang Z, Dai Z, Yang Y, Carbonell J, Salakhutdinov RR, Le QV. Xlnet: generalized autoregressive pretraining for language understanding. In: Advances in neural information processing systems, 2019; p 5754–764.

  35. Zhang T, Kishore V, Wu F, Weinberger KQ, Artzi Y. Bertscore: evaluating text generation with Bert. In: 8th International Conference on learning representations, ICLR, 2020; 2020.

  36. Zhu Y, Kiros R, Zemel R, Salakhutdinov R, Urtasun R, Torralba A, Fidler S. Aligning books and movies: Towards story-like visual explanations by watching movies and reading books. In: Proceedings of the IEEE International Conference on computer vision, 2015; p. 19–27.

Download references

Acknowledgements

This work has been partially supported by the Project IN17MO07 “Formal Specification for Secured Software System”, under the Indo-Italian Executive Programme of Scientific and Technological Cooperation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Souvick Das.

Ethics declarations

Conflict of Interest Statement

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the topical collection “Applications of Software Engineering and Tool Support” guest edited by Nabendu Chaki, Agostino Cortesi and Anirban Sarkar.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Das, S., Deb, N., Cortesi, A. et al. Sentence Embedding Models for Similarity Detection of Software Requirements. SN COMPUT. SCI. 2, 69 (2021). https://doi.org/10.1007/s42979-020-00427-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s42979-020-00427-1

Keywords

Navigation