Skip to main content

Mining Frequent Closed Itemsets from Distributed Repositories

  • Conference paper
Knowledge and Data Management in GRIDs

Abstract

In this paper we address the problem of mining frequent closed itemsets in a highly distributed setting like a Grid. The extraction of frequent (closed) item-sets is a very expensive phase needed to extract from a transactional database a reduced set of meaningful association rules. We figure out an environment where different datasets are stored in different sites. We assume that, due to the huge size of datasets and privacy concerns, dataset partitions cannot be moved to a centralized site where to materialize the whole dataset and perform the mining task. Thus it becomes mandatory to perform separate mining at each site, and then merge local results for deriving global knowledge.

This paper shows how frequent closed itemsets, mined independently at each site, can be merged in order to derive globally frequent closed itemsets. Unfortunately, such merging might produce a superset of all the frequent closed itemsets, while the associated supports could be smaller than the exact ones because some globally frequent closed itemsets might be not locally frequent in some partitions. To avoid an expensive post-processing phase, needed to compute exact global results, we use a method to approximate the supports of closed itemsets. The approximation is only needed for those globally (closed) frequent itemsets which are locally infrequent on some dataset partitions, and thus are not returned at all from the corresponding sites.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Proc. of the 1st Workshop on Frequent Itemset Mining Implementations (FIMI’03). 2003.

    Google Scholar 

  2. R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In Proc. VLDB’ 94, pages 487–499, September 1994.

    Google Scholar 

  3. Sergey Brin, Rajeev Motwani, Jeffrey D. Ullman, and Shalom Tsur. Dynamic itemset counting and implication rules for market basket data. In Joan Peckham, editor, SIGMOD 1997, Proceedings ACM SIGMOD International Conference on Management of Data, May 13–15, 1997, Tucson, Arizona, USA, pages 255–264. ACM Press, 05 1997.

    Google Scholar 

  4. E-H. S. Han, G. Karypis, and V. Kumar. Scalable parallel data mining for association rules. In IEEE Transaction on Knowledge and Data Engineering, 2000.

    Google Scholar 

  5. Jiawei Han, Jian Pei, and Yiwen Yin. Mining frequent patterns without candidate generation. In Proc. SIGMOD’ 00, pages 1–12, 2000.

    Google Scholar 

  6. H. Kargupta and P. Chan (Eds.). Advances in Distributed and Parallel Knowledge Discovery. AAAI Press/The MIT Press, 2004.

    Google Scholar 

  7. J. Liu, Y. Pan, K. Wang, and J. Han. Mining Frequent Item Sets by Opportunistic Projection. In Proc. 2002 Int. Conf. on Knowledge Discovery in Databases (KDD’ 02), Edmonton, Canada, 2002.

    Google Scholar 

  8. Claudio Lucchese, Salvatore Orlando, and Raffaele Perego. Fast and memory efficient mining of frequent closed itemsets. IEEE Transactions on Knowledge and Data Engineering, 18(1):21–36, January 2006.

    Article  Google Scholar 

  9. A. Mueller. Fast sequential and parallel algorithms for association rules mining: A comparison. Technical Report CS-TR-3515, Univ. of Maryland, 1995.

    Google Scholar 

  10. S. Orlando, P. Palmerini, R. Perego, and F. Silvestri. Adaptive and resource-aware mining of frequent sets. In Proc. The 2002 IEEE International Conference on Data Mining (ICDM’ 02), pages 338–345, 2002.

    Google Scholar 

  11. Feng Pan, Gao Cong, Anthony K. H. Tung, Jiong Yang, and Mohammed Javeed Zaki. Carpenter: finding closed patterns in long biological datasets. In KDD, pages 637–642, 2003.

    Google Scholar 

  12. B. Park and H. Kargupta. Distributed Data Mining: Algorithms, Systems, and Applications. In Data Mining Handbook, pages 341–358. IEA, 2002.

    Google Scholar 

  13. Srinivasan Parthasarathy. Efficient progressive sampling for association rules. In Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM’02), page 354. IEEE Computer Society, 2002.

    Google Scholar 

  14. Nicolas Pasquier, Yves Bastide, Rafik Taouil, and Lotfi Lakhal. Efficient mining of association rules using closed itemset lattices. Information Systems, 24(1):25–46, 1999.

    Article  Google Scholar 

  15. J. Pei, J. Han, H. Lu, S. Nishio, and D. Tang, S. amd Yang. H-Mine: Hyper-Structure Mining of Frequent Patterns in Large Databases. In Proc. The 2001 IEEE International Conference on Data Mining (ICDM’01), San Jose, CA, USA, 2000.

    Google Scholar 

  16. Jian Pei, Jiawei Han, and Jianyong Wang. Closet+: Searching for the best strategies for mining frequent closed itemsets. In SIGKDD’ 03, August 2003.

    Google Scholar 

  17. Ashoka Savasere, Edward Omiecinski, and Shamkant B. Navathe. An efficient algorithm for mining association rules in large databases. In VLDB’95, Proceedings of 21th International Conference on Very Large Data Bases, pages 432–444. Morgan Kaufmann, September 1995.

    Google Scholar 

  18. C. Silvestri and S. Orlando. Distributed Approximate Mining of Frequent Patterns. In Proc. of the 2005 ACM Symposium on Applied Computing, SAC 2005, special track on Data Mining, pages 529–536, 2005.

    Google Scholar 

  19. C. Silvestri and S. Orlando. Approximate mining of frequent patterns on streams. Intelligent Data Analysis, 2006. To appear.

    Google Scholar 

  20. Rafik Taouil, Nicolas Pasquier, Yves Bastide, Lotfi Lajhal, and Gerd Stumme. Mining freqent patterns with counting inference. SIGKDD Explorations, 2(2):66–75, December 2000.

    Article  Google Scholar 

  21. Rafik Taouil, Nicolas Pasquier, Yves Bastide, and Lotfi Lakhal. Mining bases for association rules using closed sets. In ICDE, 2000.

    Google Scholar 

  22. M. J. Zaki. Scalable algorithms for association mining. IEEE Transactions on Knowledge and Data Engineering, 12:372–390, May/June 2000.

    Article  Google Scholar 

  23. Mohammed J. Zaki. Mining non-redundant association rules. Data Min. Knowl. Discov., 9(3):223–248, 2004.

    Article  MathSciNet  Google Scholar 

  24. Mohammed J. Zaki and Karam Gouda. Fast vertical mining using diffsets. In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 326–335. ACM Press, 2003.

    Google Scholar 

  25. Mohammed J. Zaki and Ching-Jui Hsiao. Charm: An efficient algorithm for closed item-sets mining. In 2nd SIAM International Conference on Data Mining, April 2002.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer Science+Business Media, LLC

About this paper

Cite this paper

Lucchese, C., Orlando, S., Perego, R., Silvestri, C. (2007). Mining Frequent Closed Itemsets from Distributed Repositories. In: Talia, D., Bilas, A., Dikaiakos, M.D. (eds) Knowledge and Data Management in GRIDs. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-37831-2_14

Download citation

  • DOI: https://doi.org/10.1007/978-0-387-37831-2_14

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-0-387-37830-5

  • Online ISBN: 978-0-387-37831-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics