With photos and videos representing an increasing proportion of the content shared online, I am very interested in their potential for my own research. However, I struggle to incorporate visual data in my work because qualitative analysis software (at least the ones that I am familiar with) can only process alpha-numerical data. This means that I have to analyse visual data manually, which is a slow process.
The paper by Hu et al (2014) that I mentioned in my last post (you know, the one about cats not ruling the Internet) describes a computer-assisted approach to analysing photographs. This is what the authors did, as described in section 3.2 of the paper:
Coming up with good meaningful content categories is known to be challenging, especially for images since they contain much richer features than text. Therefore, as an initial pass, we sought help from computer vision techniques to get an overview of what categories exist in an efficient manner. Specifically, we first used the classical Scale Invariant Feature Transform (SIFT) algorithm (Lowe 1999) to detect and extract local discriminative features from photos in the sample. The feature vectors for photos are of 128 dimensions. Following the standard image vector quantization approach (i.e., SIFT feature clustering (Szeliski 2011)), we obtained the codebook vectors for each photo. Finally, we used k-means clustering to obtain 15 clusters of photos where the similarity between two photos are calculated in terms of Euclidean distance between their codebook vectors. These clusters served as an initial set of our coding categories, where each photo belongs to only one category.
To further improve the quality of this automated categorization, we asked two human coders who are regular users of Instagram to independently examine photos in each one of the 15 categories. They analyzed the affinity of the themes within the category and across categories, and manually adjusted categories if necessary (i.e., move photos to a more appropriate category or merge two categories if their themes are overlapped). Finally, through a discussion session where the two coders exchanged their coding results, discussed their categories and resolved their conflicts, we concluded with 8-category coding scheme of photos (see Table 1) where both coders agreed on, i.e., the Fleiss’ kappa is κ = 1. It is important to note that the stated goal of our coding was to manually provide a descriptive evaluation of photo content, not to hypothesize on the motivation of the user who is posting the photos.
Based on our 8-category coding scheme, the two coders independently categorized the rest of the 800 photos based on their main themes and their descriptions and hashtags if any (e.g., if a photo has a girl with her dog, and the description of this photo is “look at my cute dog”, then this photo is categorized into “Pet” category). The coders were asked to assign a single category to each photo (i.e., we avoid dual assignment). The initial Fleiss’ kappa is κ = 0.75. To re- solve discrepancies between coders, we asked a third-party judge to view the unresolved photos and assign them to the most appropriate categories.
Hum… sounds like I a need to get a Computer Sciences degree to be able to do this 😦 Plus, the process described still relies heavily on manual analysis to refine the coding scheme and to do the actual categorisation. Still, it sounds like a promising ‘starting point’ to develop an inductive coding scheme. So, I am adding a note to my diary to look into this during my sabbatical (one can always dream big!)
Do you use visuals in your research or work? How do you analyse them?
Lowe, D. G. 1999. Object recognition from local scale-invariant features. In CVPR.
Szeliski, R. 2011. Computer vision: algorithms and applications. Springer.