{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,12]],"date-time":"2026-01-12T23:58:04Z","timestamp":1768262284083,"version":"3.49.0"},"reference-count":55,"publisher":"MDPI AG","issue":"7","license":[{"start":{"date-parts":[[2017,7,14]],"date-time":"2017-07-14T00:00:00Z","timestamp":1499990400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Entropy"],"abstract":"<jats:p>Topological data analysis is a noble approach to extract meaningful information from high-dimensional data and is robust to noise. It is based on topology, which aims to study the geometric shape of data. In order to apply topological data analysis, an algorithm called mapper is adopted. The output from mapper is a simplicial complex that represents a set of connected clusters of data points. In this paper, we explore the feasibility of topological data analysis for mining social network data by addressing the problem of image popularity. We randomly crawl images from Instagram and analyze the effects of social context and image content on an image\u2019s popularity using mapper. Mapper clusters the images using each feature, and the ratio of popularity in each cluster is computed to determine the clusters with a high or low possibility of popularity. Then, the popularity of images are predicted to evaluate the accuracy of topological data analysis. This approach is further compared with traditional clustering algorithms, including k-means and hierarchical clustering, in terms of accuracy, and the results show that topological data analysis outperforms the others. Moreover, topological data analysis provides meaningful information based on the connectivity between the clusters.<\/jats:p>","DOI":"10.3390\/e19070360","type":"journal-article","created":{"date-parts":[[2017,7,14]],"date-time":"2017-07-14T10:45:02Z","timestamp":1500029102000},"page":"360","update-policy":"https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":15,"title":["Extracting Knowledge from the Geometric Shape of Social Network Data Using Topological Data Analysis"],"prefix":"10.3390","volume":"19","author":[{"ORCID":"https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/orcid.org\/0000-0003-0327-8879","authenticated-orcid":false,"given":"Khaled","family":"Almgren","sequence":"first","affiliation":[{"name":"Computer Science and Engineering Department, University of Bridgeport, Bridgeport, CT 06614, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Minkyu","family":"Kim","sequence":"additional","affiliation":[{"name":"ASML, 77 Danbury RD, Wilton, CT 06897, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jeongkyu","family":"Lee","sequence":"additional","affiliation":[{"name":"Computer Science and Engineering Department, University of Bridgeport, Bridgeport, CT 06614, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2017,7,14]]},"reference":[{"key":"ref_1","unstructured":"Twitter (2017, April 13). Twitter Usage. Available online: https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/about.twitter.com\/company."},{"key":"ref_2","unstructured":"Facebook (2017, April 13). Facebook Stats. Available online: https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/newsroom.fb.com\/company-info\/."},{"key":"ref_3","unstructured":"Instagram (2017, April 13). Instagram Stats. Available online: https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/business.instagram.com."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"97","DOI":"10.1109\/TKDE.2013.109","article-title":"Data mining with big data","volume":"26","author":"Wu","year":"2014","journal-title":"IEEE Trans. Knowl. Data Eng."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"293","DOI":"10.1093\/nsr\/nwt032","article-title":"Challenges of big data analysis","volume":"1","author":"Fan","year":"2014","journal-title":"Natl. Sci. Rev."},{"key":"ref_6","unstructured":"Becker, H., Naaman, M., and Gravano, L. (2014, January 22). Event Identification in Social Media. Proceedings of the International Workshop on the Web and Databases, Snowbird, UT, USA."},{"key":"ref_7","unstructured":"Edelsbrunner, H., Letscher, D., and Zomorodian, A. (2000, January 12\u201314). Topological persistence and simplification. Proceedings of the 41st Annual Symposium on Foundations of Computer Science, Washington, DC, USA."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"255","DOI":"10.1090\/S0273-0979-09-01249-X","article-title":"Topology and data","volume":"46","author":"Carlsson","year":"2009","journal-title":"Bull. Am. Math. Soc."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"957","DOI":"10.1093\/bioinformatics\/btm033","article-title":"Disease-specific genomic analysis: Identifying the signature of pathologic biology","volume":"23","author":"Nicolau","year":"2007","journal-title":"Bioinformatics"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"7265","DOI":"10.1073\/pnas.1102826108","article-title":"Topology based data analysis identifies a subgroup of breast cancers with a unique mutational profile and excellent survival","volume":"108","author":"Nicolau","year":"2011","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"ref_11","unstructured":"Choudhary, D., and Bansal, S. (2017, July 14). Available online: https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/cse.iitk.ac.in\/users\/cs365\/2014\/submissions\/deepakc\/project\/report.pdf."},{"key":"ref_12","unstructured":"Singh, G., M\u00e9moli, F., and Carlsson, G.E. (2007, January 2\u20133). Topological methods for the analysis of high dimensional data sets and 3D object recognition. Proceedings of the 2007 Symposium on Point-Based Graphics, Prague, Czech Republic."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Gidea, M., and Katz, Y.A. (arXiv, 2017). Topological Data Analysis of Financial Time Series: Landscapes of Crashes, arXiv.","DOI":"10.2139\/ssrn.2931836"},{"key":"ref_14","unstructured":"Schebesch, K.B., and Stecking, R.W. (2017). Topological Data. Operations Research Proceedings 2015, Springer."},{"key":"ref_15","unstructured":"Webster, M. (2005). The Merriam-Webster Dictionary, Merriam-Webster."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Almgren, K., Lee, J., and Kim, M. (2016, January 15\u201317). Predicting the Future Popularity of Images on Social Networks. Proceedings of the 3rd Multidisciplinary International Social Networks Conference on SocialInformatics, Union, NJ, USA.","DOI":"10.1145\/2955129.2955154"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Almgren, K., Lee, J., and Kim, M. (2016, January 14\u201315). Prediction of image popularity over time on social media networks. Proceedings of the IEEE Annual Connecticut Conference on Industrial Electronics, Technology & Automation (CT-IETA), Bridgeport, CT, USA.","DOI":"10.1109\/CT-IETA.2016.7868253"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"McParlane, P.J., Moshfeghi, Y., and Jose, J.M. (2014, January 1\u20134). Nobody comes here anymore, it\u2019s too crowded; predicting image popularity on flickr. Proceedings of the International Conference on Multimedia Retrieval, Glasgow, UK.","DOI":"10.1145\/2578726.2578776"},{"key":"ref_19","unstructured":"Can, E.F., Oktay, H., and Manmatha, R. (November, January 27). Predicting retweet count using visual cues. Proceedings of the 22nd ACM International Conference on Information & Knowledge Management, San Francisco, CA, USA."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"17","DOI":"10.1145\/2503792.2503797","article-title":"Information diffusion in online social networks: A survey","volume":"42","author":"Guille","year":"2013","journal-title":"ACM SIGMOD Rec."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Bakshy, E., Rosenn, I., Marlow, C., and Adamic, L. (2012, January 16\u201320). The role of social networks in information diffusion. Proceedings of the 21st International Conference on World Wide Web, Lyon, France.","DOI":"10.1145\/2187836.2187907"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Cappallo, S., Mensink, T., and Snoek, C.G. (2015, January 23\u201326). Latent factors of visual popularity prediction. Proceedings of the 5th ACM on International Conference on Multimedia Retrieval, Shanghai, China.","DOI":"10.1145\/2671188.2749405"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Khosla, A., Das Sarma, A., and Hamid, R. What makes an image popular?. Proceedings of the 23rd International Conference on World Wide Web, 7\u201311 April 2014.","DOI":"10.1145\/2566486.2567996"},{"key":"ref_24","unstructured":"Munkres, J.R. (2000). Topology, Prentice Hall."},{"key":"ref_25","unstructured":"Cartan, H., and Eilenberg, S. (2016). Homological Algebra (PMS-19), Princeton University Press."},{"key":"ref_26","unstructured":"Murphy, N. (2017, July 14). Topological Data Analysis. Available online: https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/www.colby.edu\/math\/program\/honorsprojects\/2016-Murphy-HonorsThesis.pdf."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"249","DOI":"10.1007\/s00454-004-1146-y","article-title":"Computing persistent homology","volume":"33","author":"Zomorodian","year":"2005","journal-title":"Discret. Comput. Geom."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"103","DOI":"10.1007\/s00454-006-1276-5","article-title":"Stability of persistence diagrams","volume":"37","author":"Edelsbrunner","year":"2007","journal-title":"Discret. Comput. Geom."},{"key":"ref_29","unstructured":"Michel, B. (2017, July 14). Available online: https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/www.turing-gateway.cam.ac.uk\/sites\/default\/files\/asset\/doc\/1606\/BertrandMichel.pdf."},{"key":"ref_30","unstructured":"M\u00fcllner, D., and Babu, A. (2017, July 14). Python Mapper: An Open-Source Toolchain for Data Exploration, Analysis and Visualization. Available online: https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/http\/danifold.net\/mapper."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Erlandsson, F., Br\u00f3dka, P., Borg, A., and Johnson, H. (2016). Finding influential users in social media using association rule learning. Entropy, 18.","DOI":"10.3390\/e18050164"},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"52","DOI":"10.1007\/s13278-016-0360-y","article-title":"An empirical comparison of influence measurements for social network analysis","volume":"6","author":"Almgren","year":"2016","journal-title":"Soc. Netw. Anal. Min."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Chen, W., Gao, Q., and Xiong, H. (2016). Temporal Predictability of Online Behavior in Foursquare. Entropy, 18.","DOI":"10.3390\/e18080296"},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"2662","DOI":"10.3390\/e15072662","article-title":"Exploring the characteristics of innovation adoption in social networks: Structure, homophily, and strategy","volume":"15","author":"Li","year":"2013","journal-title":"Entropy"},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"4648","DOI":"10.3390\/e15114648","article-title":"From Observable Behaviors to Structures of Interaction in Binary Games of Strategic Complements","volume":"15","year":"2013","journal-title":"Entropy"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Silva, T.H., de Melo, P.O.V., Almeida, J.M., Salles, J., and Loureiro, A.A. (2013, January 20\u201323). A picture of Instagram is worth more than a thousand words: Workload characterization and application. Proceedings of the 2013 IEEE International Conference on Distributed Computing in Sensor Systems (DCOSS), Cambridge, MA, USA.","DOI":"10.1109\/DCOSS.2013.59"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Mejova, Y., Haddadi, H., Noulas, A., and Weber, I. (2015, January 18\u201320). #Foodporn: Obesity patterns in culinary interactions. Proceedings of the 5th International Conference on Digital Health, Florence, Italy.","DOI":"10.1145\/2750511.2750524"},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"3053","DOI":"10.3390\/e17053053","article-title":"Predicting community evolution in social networks","volume":"17","author":"Saganowski","year":"2015","journal-title":"Entropy"},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"5419","DOI":"10.3390\/e15125419","article-title":"Core-based dynamic community detection in mobile social networks","volume":"15","author":"Xu","year":"2013","journal-title":"Entropy"},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Aloufi, S., Zhu, S., and El Saddik, A. (2017). On the Prediction of Flickr Image Popularity by Analyzing Heterogeneous Social Sensory Data. Sensors, 17.","DOI":"10.3390\/s17030631"},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Yamaguchi, K., Berg, T.L., and Ortiz, L.E. (2014, January 3\u20137). Chic or social: Visual popularity analysis in online fashion networks. Proceedings of the 22nd ACM International Conference on Multimedia, Orlando, FL, USA.","DOI":"10.1145\/2647868.2654958"},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Totti, L.C., Costa, F.A., Avila, S., Valle, E., Meira, W., and Almeida, V. (2014, January 23\u201326). The impact of visual attributes on online image diffusion. Proceedings of the 2014 ACM Conference on Web Science, Bloomington, IN, USA.","DOI":"10.1145\/2615569.2615700"},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Niu, X., Li, L., Mei, T., Shen, J., and Xu, K. (2012, January 9\u201313). Predicting image popularity in an incomplete social media community by a weighted bi-partite graph. Proceedings of the 2012 IEEE International Conference on Multimedia and Expo (ICME), Melbourne, VIC, Australia.","DOI":"10.1109\/ICME.2012.43"},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Gelli, F., Uricchio, T., Bertini, M., Del Bimbo, A., and Chang, S.F. (2015, January 26\u201330). Image popularity prediction in social media using sentiment and context features. Proceedings of the 23rd ACM International Conference on Multimedia, Brisbane, QLD, Australia.","DOI":"10.1145\/2733373.2806361"},{"key":"ref_45","first-page":"2","article-title":"Writing Captions","volume":"32","author":"Oglesbee","year":"1998","journal-title":"Commun. J. Educ. Today"},{"key":"ref_46","unstructured":"Mikolov, T., Chen, K., Corrado, G., and Dean, J. (arXiv, 2013). Efficient estimation of word representations in vector space, arXiv."},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"4517","DOI":"10.1063\/1.434593","article-title":"Information theory, distance matrix, and molecular branching","volume":"67","author":"Bonchev","year":"1977","journal-title":"J. Chem. Phys."},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Larsen, B., and Aone, C. (1999, January 15\u201318). Fast and effective text mining using linear-time document clustering. Proceedings of the Fifth ACM SIGKDD International Conference On Knowledge Discovery and Data Mining, San Diego, CA, USA.","DOI":"10.1145\/312129.312186"},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Deza, M.M., and Deza, E. (2009). Encyclopedia of distances. Encyclopedia of Distances, Springer.","DOI":"10.1007\/978-3-642-00234-2"},{"key":"ref_50","unstructured":"Rehurek, R., and Sojka, P. (2011). Gensim\u2013Python Framework for Vector Space Modelling, Masaryk University."},{"key":"ref_51","first-page":"2825","article-title":"Scikit-learn: Machine learning in Python","volume":"12","author":"Pedregosa","year":"2011","journal-title":"J. Mach. Learn. Res."},{"key":"ref_52","doi-asserted-by":"crossref","unstructured":"Bird, S. (2006). NLTK: The natural language toolkit. Proceedings of the COLING\/ACL on Interactive Presentation Sessions, Sydney, NSW, Australia, 17\u201318 July 2006, Association for Computational Linguistics.","DOI":"10.3115\/1225403.1225421"},{"key":"ref_53","unstructured":"Arthur, D., and Vassilvitskii, S. (2007). k-means++: The advantages of careful seeding. Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, New Orleans, Louisiana, 7\u20139 January 2007, Society for Industrial and Applied Mathematics."},{"key":"ref_54","first-page":"100","article-title":"Algorithm AS 136: A k-means clustering algorithm","volume":"28","author":"Hartigan","year":"1979","journal-title":"J. R. Stat. Soc. Ser. C"},{"key":"ref_55","doi-asserted-by":"crossref","first-page":"241","DOI":"10.1007\/BF02289588","article-title":"Hierarchical clustering schemes","volume":"32","author":"Johnson","year":"1967","journal-title":"Psychometrika"}],"container-title":["Entropy"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/www.mdpi.com\/1099-4300\/19\/7\/360\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T18:42:45Z","timestamp":1760208165000},"score":1,"resource":{"primary":{"URL":"https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/www.mdpi.com\/1099-4300\/19\/7\/360"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2017,7,14]]},"references-count":55,"journal-issue":{"issue":"7","published-online":{"date-parts":[[2017,7]]}},"alternative-id":["e19070360"],"URL":"https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/doi.org\/10.3390\/e19070360","relation":{},"ISSN":["1099-4300"],"issn-type":[{"value":"1099-4300","type":"electronic"}],"subject":[],"published":{"date-parts":[[2017,7,14]]}}}