Image Clustering: An Unsupervised Approach to Categorize Visual Data in Social Science Research

Automated image analysis has received increasing attention in social scientific research, yet existing scholarship has mostly covered the application of supervised learning to classify images into predefined categories. This study focuses on the task of unsupervised image clustering, which aims to a...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Sociological methods & research 2024-08, Vol.53 (3), p.1534-1587
Hauptverfasser: Zhang, Han, Peng, Yilang
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 1587
container_issue 3
container_start_page 1534
container_title Sociological methods & research
container_volume 53
creator Zhang, Han
Peng, Yilang
description Automated image analysis has received increasing attention in social scientific research, yet existing scholarship has mostly covered the application of supervised learning to classify images into predefined categories. This study focuses on the task of unsupervised image clustering, which aims to automatically discover categories from unlabelled image data. We first review the steps to perform image clustering and then focus on one key challenge in this task—finding intermediate representations of images. We present several methods of extracting intermediate image representations, including the bag-of-visual-words model, self-supervised learning, and transfer learning (in particular, feature extraction with pretrained models). We compare these methods using various visual datasets, including images related to protests in China from Weibo, images about climate change on Instagram, and profile images of the Russian Internet Research Agency on Twitter. In addition, we propose a systematic way to interpret and validate clustering solutions. Results show that transfer learning significantly outperforms the other methods. The dataset used in the pretrained model critically determines what categories the algorithms can discover.
doi_str_mv 10.1177/00491241221082603
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_3089903064</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sage_id>10.1177_00491241221082603</sage_id><sourcerecordid>3089903064</sourcerecordid><originalsourceid>FETCH-LOGICAL-c355t-3b3e84a844f26cbefce7a05d95574c8a73024b4a542b6dcd083f8479eb2391de3</originalsourceid><addsrcrecordid>eNp1kE9Lw0AQxRdRsFY_gLcFz6mz_5KNtxKtFgqCtV7DZjNpt7RJ3E0E_fSmVPAgnoZhfu_N4xFyzWDCWJLcAsiUcck4Z6B5DOKEjJhSPNI8ladkdLhHB-CcXISwBWA8ATEi-Xxv1kizXR869K5e39FpTVd16Fv0Hy5gSadt6xtjN7RraGY6XDfefSF9c6E3O3pvOkNdTZeNdcO6tA5ri_QFAxpvN5fkrDK7gFc_c0xWs4fX7ClaPD_Os-kiskKpLhKFQC2NlrLisS2wspgYUGWqVCKtNokALgtplORFXNoStKi0TFIsuEhZiWJMbo6-Q9b3HkOXb5ve18PLXIBOUxAQy4FiR8r6JgSPVd56tzf-M2eQH3rM__Q4aCZHTRiK-nX9X_ANmT1xpA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3089903064</pqid></control><display><type>article</type><title>Image Clustering: An Unsupervised Approach to Categorize Visual Data in Social Science Research</title><source>Sociological Abstracts</source><source>SAGE Complete A-Z List</source><creator>Zhang, Han ; Peng, Yilang</creator><creatorcontrib>Zhang, Han ; Peng, Yilang</creatorcontrib><description>Automated image analysis has received increasing attention in social scientific research, yet existing scholarship has mostly covered the application of supervised learning to classify images into predefined categories. This study focuses on the task of unsupervised image clustering, which aims to automatically discover categories from unlabelled image data. We first review the steps to perform image clustering and then focus on one key challenge in this task—finding intermediate representations of images. We present several methods of extracting intermediate image representations, including the bag-of-visual-words model, self-supervised learning, and transfer learning (in particular, feature extraction with pretrained models). We compare these methods using various visual datasets, including images related to protests in China from Weibo, images about climate change on Instagram, and profile images of the Russian Internet Research Agency on Twitter. In addition, we propose a systematic way to interpret and validate clustering solutions. Results show that transfer learning significantly outperforms the other methods. The dataset used in the pretrained model critically determines what categories the algorithms can discover.</description><identifier>ISSN: 0049-1241</identifier><identifier>EISSN: 1552-8294</identifier><identifier>DOI: 10.1177/00491241221082603</identifier><language>eng</language><publisher>Los Angeles, CA: SAGE Publications</publisher><subject>Classification ; Climate change ; Clustering ; Extraction ; Imagery ; Learning ; Social research</subject><ispartof>Sociological methods &amp; research, 2024-08, Vol.53 (3), p.1534-1587</ispartof><rights>The Author(s) 2022</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c355t-3b3e84a844f26cbefce7a05d95574c8a73024b4a542b6dcd083f8479eb2391de3</citedby><cites>FETCH-LOGICAL-c355t-3b3e84a844f26cbefce7a05d95574c8a73024b4a542b6dcd083f8479eb2391de3</cites><orcidid>0000-0003-2912-8780 ; 0000-0001-7711-9518</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://journals.sagepub.com/doi/pdf/10.1177/00491241221082603$$EPDF$$P50$$Gsage$$H</linktopdf><linktohtml>$$Uhttps://journals.sagepub.com/doi/10.1177/00491241221082603$$EHTML$$P50$$Gsage$$H</linktohtml><link.rule.ids>314,780,784,21810,27915,27916,33765,43612,43613</link.rule.ids></links><search><creatorcontrib>Zhang, Han</creatorcontrib><creatorcontrib>Peng, Yilang</creatorcontrib><title>Image Clustering: An Unsupervised Approach to Categorize Visual Data in Social Science Research</title><title>Sociological methods &amp; research</title><description>Automated image analysis has received increasing attention in social scientific research, yet existing scholarship has mostly covered the application of supervised learning to classify images into predefined categories. This study focuses on the task of unsupervised image clustering, which aims to automatically discover categories from unlabelled image data. We first review the steps to perform image clustering and then focus on one key challenge in this task—finding intermediate representations of images. We present several methods of extracting intermediate image representations, including the bag-of-visual-words model, self-supervised learning, and transfer learning (in particular, feature extraction with pretrained models). We compare these methods using various visual datasets, including images related to protests in China from Weibo, images about climate change on Instagram, and profile images of the Russian Internet Research Agency on Twitter. In addition, we propose a systematic way to interpret and validate clustering solutions. Results show that transfer learning significantly outperforms the other methods. The dataset used in the pretrained model critically determines what categories the algorithms can discover.</description><subject>Classification</subject><subject>Climate change</subject><subject>Clustering</subject><subject>Extraction</subject><subject>Imagery</subject><subject>Learning</subject><subject>Social research</subject><issn>0049-1241</issn><issn>1552-8294</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>BHHNA</sourceid><recordid>eNp1kE9Lw0AQxRdRsFY_gLcFz6mz_5KNtxKtFgqCtV7DZjNpt7RJ3E0E_fSmVPAgnoZhfu_N4xFyzWDCWJLcAsiUcck4Z6B5DOKEjJhSPNI8ladkdLhHB-CcXISwBWA8ATEi-Xxv1kizXR869K5e39FpTVd16Fv0Hy5gSadt6xtjN7RraGY6XDfefSF9c6E3O3pvOkNdTZeNdcO6tA5ri_QFAxpvN5fkrDK7gFc_c0xWs4fX7ClaPD_Os-kiskKpLhKFQC2NlrLisS2wspgYUGWqVCKtNokALgtplORFXNoStKi0TFIsuEhZiWJMbo6-Q9b3HkOXb5ve18PLXIBOUxAQy4FiR8r6JgSPVd56tzf-M2eQH3rM__Q4aCZHTRiK-nX9X_ANmT1xpA</recordid><startdate>20240801</startdate><enddate>20240801</enddate><creator>Zhang, Han</creator><creator>Peng, Yilang</creator><general>SAGE Publications</general><general>SAGE PUBLICATIONS, INC</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7U4</scope><scope>8BJ</scope><scope>BHHNA</scope><scope>DWI</scope><scope>FQK</scope><scope>JBE</scope><scope>WZK</scope><orcidid>https://orcid.org/0000-0003-2912-8780</orcidid><orcidid>https://orcid.org/0000-0001-7711-9518</orcidid></search><sort><creationdate>20240801</creationdate><title>Image Clustering: An Unsupervised Approach to Categorize Visual Data in Social Science Research</title><author>Zhang, Han ; Peng, Yilang</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c355t-3b3e84a844f26cbefce7a05d95574c8a73024b4a542b6dcd083f8479eb2391de3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Classification</topic><topic>Climate change</topic><topic>Clustering</topic><topic>Extraction</topic><topic>Imagery</topic><topic>Learning</topic><topic>Social research</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Zhang, Han</creatorcontrib><creatorcontrib>Peng, Yilang</creatorcontrib><collection>CrossRef</collection><collection>Sociological Abstracts (pre-2017)</collection><collection>International Bibliography of the Social Sciences (IBSS)</collection><collection>Sociological Abstracts</collection><collection>Sociological Abstracts</collection><collection>International Bibliography of the Social Sciences</collection><collection>International Bibliography of the Social Sciences</collection><collection>Sociological Abstracts (Ovid)</collection><jtitle>Sociological methods &amp; research</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Zhang, Han</au><au>Peng, Yilang</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Image Clustering: An Unsupervised Approach to Categorize Visual Data in Social Science Research</atitle><jtitle>Sociological methods &amp; research</jtitle><date>2024-08-01</date><risdate>2024</risdate><volume>53</volume><issue>3</issue><spage>1534</spage><epage>1587</epage><pages>1534-1587</pages><issn>0049-1241</issn><eissn>1552-8294</eissn><abstract>Automated image analysis has received increasing attention in social scientific research, yet existing scholarship has mostly covered the application of supervised learning to classify images into predefined categories. This study focuses on the task of unsupervised image clustering, which aims to automatically discover categories from unlabelled image data. We first review the steps to perform image clustering and then focus on one key challenge in this task—finding intermediate representations of images. We present several methods of extracting intermediate image representations, including the bag-of-visual-words model, self-supervised learning, and transfer learning (in particular, feature extraction with pretrained models). We compare these methods using various visual datasets, including images related to protests in China from Weibo, images about climate change on Instagram, and profile images of the Russian Internet Research Agency on Twitter. In addition, we propose a systematic way to interpret and validate clustering solutions. Results show that transfer learning significantly outperforms the other methods. The dataset used in the pretrained model critically determines what categories the algorithms can discover.</abstract><cop>Los Angeles, CA</cop><pub>SAGE Publications</pub><doi>10.1177/00491241221082603</doi><tpages>54</tpages><orcidid>https://orcid.org/0000-0003-2912-8780</orcidid><orcidid>https://orcid.org/0000-0001-7711-9518</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 0049-1241
ispartof Sociological methods & research, 2024-08, Vol.53 (3), p.1534-1587
issn 0049-1241
1552-8294
language eng
recordid cdi_proquest_journals_3089903064
source Sociological Abstracts; SAGE Complete A-Z List
subjects Classification
Climate change
Clustering
Extraction
Imagery
Learning
Social research
title Image Clustering: An Unsupervised Approach to Categorize Visual Data in Social Science Research
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-14T22%3A31%3A26IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Image%20Clustering:%20An%20Unsupervised%20Approach%20to%20Categorize%20Visual%20Data%20in%20Social%20Science%20Research&rft.jtitle=Sociological%20methods%20&%20research&rft.au=Zhang,%20Han&rft.date=2024-08-01&rft.volume=53&rft.issue=3&rft.spage=1534&rft.epage=1587&rft.pages=1534-1587&rft.issn=0049-1241&rft.eissn=1552-8294&rft_id=info:doi/10.1177/00491241221082603&rft_dat=%3Cproquest_cross%3E3089903064%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3089903064&rft_id=info:pmid/&rft_sage_id=10.1177_00491241221082603&rfr_iscdi=true