I suspect that we have all had the experience of filling in a form and not finding a box that exactly describes us. That’s because those forms are trying to fit us into neat categories, whereas our identities, behaviours and preferences are often fluid, contextual, and relational. So, there is a clash between the managerial desire for clarity on the one hand and the reality of human nature on the other.
A recent study by Marta Stelmaszak, Erica L. Wagner and Nicolle Nixon DuPont follows an organisation’s attempt to capture that complex reality in its databases, and offers a powerful and cautionary tale. The setting was a University in the US, and the goal of the exercise was to recognise and support LGBTQ+ members of the community, by collecting data about their legal sex, gender identity and sexual orientation, in university systems. The research is reported in this paper: Stelmaszak, M., Wagner, E. L., & DuPont, N. N. (2024). Recognition in Personal Data: Data Warping, Recognition Concessions, and Social Justice. MIS Quarterly, 48 (4): 1611–1636. https://doi.org/10.25300/MISQ/2023/18088
The heart of the issue discussed in the paper authored by Stelmaszak and her colleagues is classification. Most organisational systems require sorting people into discrete categories in order to function. For instance, male vs female, customer vs non-customer, high vs low risk, employee vs customer, etc… These classifications are consequential because they shape how people are seen and resources are allocated, as discussed in the book “Sorting Things Out” by Geoffrey C. Bowker and Susan Leigh Star (open access version available here). But, as discussed in this paper, these classifications are also inherently reductive, because they use discrete, static categories to represent human traits that exist on a spectrum and that may change over time.
The paper identifies three types of challenges that organisations may face when trying to codify complex human traits that exist on a spectrum or that change over time, and which create a problem that the authors call “Data Warping” – that is, which distorts how the person is misrepresented in the dataset.
1 Challenges related to data design
Decisions about what categories to include, how many options to offer, and whether they change over time is permitted are often treated as technical details. These decisions have operational as well as ethical consequences because overly rigid data structures may simplify reporting, but they can also misrepresent or silence important variations.
2 Challenges related to data governance
Once data are collected, they travel across systems. For instance, databases are shared with different departments. Managers may make high-stakes decisions based on information that may lack context. Classification systems need the ability to be rethought and updated.
3 Challenges related to data interpretation
Data about people are often treated as facts, rather than partial representations. The risk is that simplified representations of identity or behaviour become amplified and normalised.
Data warping trade-offs
In their study, Stelmaszak and colleagues identified four strategies to deal with data warping, each with its own dilemma.
| Strategy | Example | Trade-off | Lesson |
| Prioritising Meaning over Usability | The university collected rich data about users’ gender identity and sexual preferences. But it ended up not making all the data available to departments because of privacy concerns | This protected individuals’ privacy, but it meant the data were not available for decision making, reporting, etc… | Data collected but not used fails to create impact |
| Prioritising Usability over Meaning | To capture non-binary gender identities, the university use a dropdown list mapped to a validation table | This made the data usable by different systems and easy to report. However, it failed to capture the full spectrum of identity | Standardisation is essential for analytics but can erase nuance |
| Compromising both on both Usability and Meaning | It wasn’t possible to select more than one category (e.g., more than one gender). As a compromise, the university used a text box that captured the nuance but was not included in analysis | Unstructured text was not used for analysis, even though the data were collected. As a result, plurality was not fully recognised in reports, etc… | Compromises result in data that serves neither human nor system needs |
| Forfeiting both Usability and Meaning | To respect gender fluidity, the team decided to keep only the most recent response in the system (i.e., did not keep past responses to the question) | This avoided mislabelling someone by a past identity or anchoring them to a previous category. However, it also erased the ability to map identity fluidity within the community | The choice not to capture certain data shapes what you can represent |
I think that, while the paper focuses on gender identity, it is relevant well beyond this context. Any attempt to capture human characteristics that are ambiguous, contextual, or changeable will face similar tensions. And it certainly resonates some challenges I witnessed in my own research about customer profiling (here and here). So, the next time you design a form (or have to fill one!), build a customer segment, or create (or read) a report, ask yourself: What human complexity is being “warped” in these categories, and what is being trade-off for simplification?
