skip to main content
10.1145/3524458.3547246acmconferencesArticle/Chapter ViewAbstractPublication PagesgooditConference Proceedingsconference-collections
research-article
Open Access

Unconventional data for policy: Using Big Data for detecting Italian innovative SMEs

Published:07 September 2022Publication History

ABSTRACT

The paper explores the possibility to employ the source code of corporate websites as an information source for research in innovation studies. Research in this area is generally based on studies that collect data on patents or official data sources. Our paper links the standard economic information of the firm with web-based data and joins the ongoing debate with a threefold contribution. First, whereas the majority of the literature focused on the linguistic content of web-pages, we mostly use HTML tags. Second, we propose a method to assess the quality of the linkage of Web data to firm-level information. Third, we show that the data retrieved from corporate websites can aid to identify ‘innovative SMEs’.

References

  1. Roberto Antonietti and Francesca Gambarotto. 2020. The role of industry variety in the creation of innovative start-ups in Italy. Small Business Economics 54, 2 (2020), 561–573. https://doi.org/10.1007/s11187-018-0034-4Google ScholarGoogle ScholarCross RefCross Ref
  2. David Antons, Eduard Grünwald, Patrick Cichy, and Torsten Oliver Salge. 2020. The application of text mining methods in innovation research: current state, evolution patterns, and development priorities. R&D Management 50, 3 (2020), 329–351. https://doi.org/10.1111/radm.12408Google ScholarGoogle ScholarCross RefCross Ref
  3. Sanjay K. Arora, Yin Li, Jan Youtie, and Philip Shapira. 2016. Using the Wayback Machine to mine websites in the social sciences: A methodological resource. Journal of the Association for Information Science and Technology 67, 8(2016), 1904–1915. https://doi.org/10.1002/asi.23503Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Sanjay K. Arora, Jan Youtie, Philip Shapira, Lidan Gao, and TingTing Ma. 2013. Entry strategies in an emerging technology: A pilot web-based study of graphene firms. Scientometrics 95, 3 (2013), 1189–1207. https://doi.org/10.1007/s11192-013-0950-7Google ScholarGoogle ScholarCross RefCross Ref
  5. Nikolaos Askitas and Klaus F. Zimmermann. 2015. The Internet as a data source for advancement in Social Sciences. International Journal of Manpower 36, 1 (2015), 2–12. https://doi.org/10.1108/IJM-02-2015-0029Google ScholarGoogle ScholarCross RefCross Ref
  6. Janna Axenbeck and Patrick Breithaupt. 2021. Innovation indicators based on firm websites—Which website characteristics predict firm-level innovation activity?PLOS ONE 16, 4 (2021), 1–23. https://doi.org/10.1371/journal.pone.0249583Google ScholarGoogle Scholar
  7. Giulio Barcaroli, Monica Scannapieco, and Summa Donato. 2016. On the Use of Internet as a Data Source for Official Statistics: a Strategy for Identifying Enterprises on the Web. Rivista Italiana di Economia Demografia e Statistica 70, 4(2016), 25–41.Google ScholarGoogle Scholar
  8. Desamparados Blázquez and Josep Domènech. 2018. Big Data sources and methods for social and economic analyses. Technological Forecasting and Social Change 130 (2018), 99–113. https://doi.org/10.1016/j.techfore.2017.07.027Google ScholarGoogle ScholarCross RefCross Ref
  9. Desamparados Blázquez, Josep Domènech, and Ana Debón. 2018. Do corporate websites’ changes reflect firms’ survival?Online Information Review 42, 6 (2018), 956–970. https://doi.org/10.1108/OIR-11-2016-0321Google ScholarGoogle Scholar
  10. Desamparados Blázquez, Josep Domènech, Jose A. Gil, and Ana Pont. 2019. Monitoring e-commerce adoption from online data. Knowledge and Information Systems 60, 1 (2019), 227–245. https://doi.org/10.1007/s10115-018-1233-7Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Hyunyoung Choi and Hal Varian. 2012. Predicting the Present with Google Trends. Economic Record 88, s1 (2012), 2–9. https://doi.org/10.1111/j.1475-4932.2012.00809.xGoogle ScholarGoogle ScholarCross RefCross Ref
  12. Alessandra Colombelli. 2016. The impact of local knowledge bases on the creation of innovative start-ups in Italy. Small Business Economics 47, 2 (2016), 383–396. https://doi.org/10.1007/s11187-016-9722-0Google ScholarGoogle ScholarCross RefCross Ref
  13. Lisa Crosato, Josep Domènech, and Caterina Liberati. 2021. Predicting SME’s default: Are their websites informative?Economics Letters 204(2021), 109888. https://doi.org/10.1016/j.econlet.2021.109888Google ScholarGoogle Scholar
  14. Piet J. H. Daas, Marco J. Puts, Bart Buelens, and Paul A. M. van den Hurk. 2015. Big Data as a Source for Official Statistics. Journal of Official Statistics 31, 2 (2015), 249–262. https://doi.org/10.1515/jos-2015-0016Google ScholarGoogle ScholarCross RefCross Ref
  15. Piet J. H. Daas and Suzanne van der Doef. 2020. Detecting innovative companies via their website. Statistical Journal of the IAOS 36, 4 (2020), 1239–1251. https://doi.org/10.3233/SJI-200627Google ScholarGoogle ScholarCross RefCross Ref
  16. Josep Domènech, Bernardo de la Ossa, Ana Pont, Jose A. Gil, Milagros Martinez, and Alicia Rubio. 2012. An Intelligent System for Retrieving Economic Information from Corporate Websites. In 2012 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology, Vol. 1. IEEE Computer Society, Washington (DC), 573–578. https://doi.org/10.1109/WI-IAT.2012.92Google ScholarGoogle Scholar
  17. Liran Einav and Jonathan Levin. 2014. The Data Revolution and Economic Analysis. Innovation Policy and the Economy 14 (2014), 1–24. https://doi.org/10.1086/674019Google ScholarGoogle ScholarCross RefCross Ref
  18. Magda Fontana and Marco Guerzoni. Forthcoming. Modeling complexity with unconventional data: Foundational issues in computational social science. In Handbook of Computational Social Science, E. Bertoni, M. Fontana, L. Gabrielli, S. Signorelli, and M. Vespe (Eds.). JRC, Oxford, Chapter 1.Google ScholarGoogle Scholar
  19. Abdullah Gök, Alec Waterworth, and Philip Shapira. 2015. Use of web mining in studying innovation. Scientometrics 102, 1 (2015), 653–671. https://doi.org/10.1007/s11192-014-1434-0Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Marco Guerzoni, Consuelo R. Nava, and Massimiliano Nuccio. 2021. Start-ups survival through a crisis. Combining machine learning with econometrics to measure innovation. Economics of Innovation and New Technology 30, 5 (2021), 468–493. https://doi.org/10.1080/10438599.2020.1769810Google ScholarGoogle ScholarCross RefCross Ref
  21. Mikaël Héroux-Vaillancourt, Catherine Beaudry, and Constant Rietsch. 2020. Using web content analysis to create innovation indicators—What do we really measure?Quantitative Science Studies 1, 4 (2020), 1601–1637. https://doi.org/10.1162/qss_a_00086Google ScholarGoogle Scholar
  22. John E. Hopcroft and Richard M. Karp. 1973. An n5/2 Algorithm for Maximum Matchings in Bipartite Graphs. SIAM J. Comput. 2, 4 (1973), 225–231. https://doi.org/10.1137/0202019Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Jiri Hradec, Nicole Ostlaender, Charles Macmillan, Szvetlana Acs, Giulia Listorti, Robert Tomas, and Xavier Arnes Novau. 2019. Semantic Text Analysis tool: SeTA. Publications Office of the European Union, Luxembourg (LU). https://doi.org/10.2760/577814 JRC116152.Google ScholarGoogle Scholar
  24. Istituto nazionale di statistica (Istat). 2018. Social Mood on Economy Index. https://www.istat.it/en/archive/219600 Last accessed May 17 2022.Google ScholarGoogle Scholar
  25. J. Sylvan Katz and Viv Cothey. 2006. Web indicators for complex innovation systems. Research Evaluation 15, 2 (2006), 85–95. https://doi.org/10.3152/147154406781775922Google ScholarGoogle ScholarCross RefCross Ref
  26. Jan Kinne and Janna Axenbeck. 2020. Web mining for innovation ecosystem mapping: A framework and a large-scale pilot study. Scientometrics 125, 3 (2020), 2011–2041. https://doi.org/10.1007/s11192-020-03726-9Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Jan Kinne and David Lenz. 2021. Predicting innovative firms using web mining and deep learning. PLOS ONE 16, 4 (2021), 1–18. https://doi.org/10.1371/journal.pone.0249071Google ScholarGoogle ScholarCross RefCross Ref
  28. Raymond Kosala and Hendrik Blockeel. 2000. Web Mining Research: A Survey. ACM SIGKDD Explorations Newsletter 2, 1 (2000), 1––15. https://doi.org/10.1145/360402.360406Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Dirk Libaers, Diana Hicks, and Alan L. Porter. 2016. A taxonomy of small firm technology commercialization. Industrial and Corporate Change 25, 3 (2016), 371–405. https://doi.org/10.1093/icc/dtq039Google ScholarGoogle ScholarCross RefCross Ref
  30. Bjoern Witting, Joachim Weisbrod, and Mirjam Weber. 2013. Identical variables in different business and trade-related statistics—A challenge for European statistics. In Proceedings of the 59th World Statistics Congress of the International Statistical Institute, 25–30 August 2013. ISI, Hong Kong (CN), 217–222. Session IPS023 ‘Modernisation of business statistics’.Google ScholarGoogle Scholar

Index Terms

  1. Unconventional data for policy: Using Big Data for detecting Italian innovative SMEs

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          GoodIT '22: Proceedings of the 2022 ACM Conference on Information Technology for Social Good
          September 2022
          436 pages
          ISBN:9781450392846
          DOI:10.1145/3524458

          Copyright © 2022 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 7 September 2022

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed limited

          Upcoming Conference

          GoodIT '24

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format