Ensuring Ethical AI: Lessons from Amazon's Barrier-to-Exit Analysis

3 May 2024


(1) Jonathan H. Rystrøm.

Abstract and Introduction

Previous Literature

Methods and Data



Conclusions and References

A. Validation of Assumptions

B. Other Models

C. Pre-processing steps

6 Conclusion

Understanding how recommender systems shape our behaviour is essential to avoid manipulation. In this paper, we investigated the Amazon recommender system concerning whether it has made it harder to change preferences. By analysing the Barrier-to-Exit (Rakova & Chowdhury, 2019) of more than 50,000 users, we found a highly significant growth in Barrier-to-Exit over time, which indicates that it has indeed become harder to change preferences for the analysed users.

However, sampling bias induced by the calculation of Barrier-to-Exit makes it difficult to draw conclusions about the general population of Amazon customers. This highlights the dilemma of portability in measuring socio-technical systems (Selbst et al., 2019): accurately evaluating a concept like ”changing preferences” requires adapting to the context of the system, which makes it more difficult to generalise (and compare) to other systems.

Comparing recommender systems is necessary for ensuring that these respect human autonomy (Varshney, 2020) and live up to new regulations such as the EU AI Act (Kop, 2021). Further work, should aim to create auditing procedures and metrics that allow third parties to measure potential preference manipulation in a way that fits within the context of the industry and allows for comparisons between different systems. This will help assess the pressures of Surveillance Capitalism (Zuboff, 2019) on human autonomy.


Baayen, R. H., Davidson, D. J., & Bates, D. M. (2008). Mixed-effects modeling with crossed random effects for subjects and items. Journal of memory and language, 59 (4), 390–412. (Publisher: Elsevier)

Barto´n, K. (2022, September). MuMIn: Multi-Model Inference. Retrieved 2023-01-08, from https://CRAN.R-project .org/package=MuMIn

Bates, D., M¨achler, M., Bolker, B., & Walker, S. (2014, June). Fitting Linear Mixed-Effects Models using lme4. arXiv. Retrieved 2023-01-12, from http://arxiv.org/abs/1406.5823 (arXiv:1406.5823 [stat])

Bennett, J., & Lanning, S. (2007). The netflix prize. In Proceedings of KDD cup and workshop (Vol. 2007, p. 35). New York.

urrell, J. (2016, June). How the machine ‘thinks’: Understanding opacity in machine learning algorithms. Big Data & Society, 3 (1), 2053951715622512. Retrieved 2022-10-11, from https://ezproxy-prd.bodleian.ox.ac.uk:2246/doi/full/10.1177/2053951715622512 (Publisher: SAGE Publications Ltd) doi: 10.1177/ 2053951715622512

Calvo, R. A., Peters, D., Vold, K., & Ryan, R. M. (2020). Supporting Human Autonomy in AI Systems: A Framework for Ethical Enquiry. In C. Burr & L. Floridi (Eds.), Ethics of Digital Well-Being: A Multidisciplinary Approach (pp. 31–54). Cham: Springer International Publishing. Retrieved 2022-08-16, from https://doi.org/10.1007/ 978-3-030-50585-1 2 doi: 10.1007/978-3-030-50585-1 2

Carroll, M. D., Dragan, A., Russell, S., & Hadfield-Menell, D. (2022). Estimating and penalizing induced preference shifts in recommender systems. In International Conference on Machine Learning (pp. 2686–2708). PMLR.

Feng, C., Wang, H., Lu, N., Chen, T., He, H., Lu, Y., & Tu, X. M. (2014, April). Log-transformation and its implications for data analysis. Shanghai Archives of Psychiatry, 26 (2), 105–109. Retrieved 2023-01-10, from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4120293/ doi: 10.3969/j.issn.1002-0829.2014.02.009

Floridi, L., Holweg, M., Taddeo, M., Amaya Silva, J., M¨okander, J., & Wen, Y. (2022, March). capAI - A Procedure for Conducting Conformity Assessment of AI Systems in Line with the EU Artificial Intelligence Act (SSRN Scholarly Paper No. ID 4064091). Rochester, NY: Social Science Research Network. Retrieved 2022-04-01, from https://papers.ssrn.com/abstract=4064091 doi: 10.2139/ssrn.4064091

Fox, J. (2003). Effect Displays in R for Generalised Linear Models. Journal of Statistical Software, 8 (15). Retrieved 2023-01-10, from http://www.jstatsoft.org/v08/i15/ doi: 10.18637/jss.v008.i15

Fox, J. (2015). Applied regression analysis and generalized linear models. Sage Publications.

Franklin, M., Ashton, H., Gorman, R., & Armstrong, S. (2022, May). Missing Mechanisms of Manipulation in the EU AI Act. The International FLAIRS Conference Proceedings, 35 . Retrieved 2022-12-31, from https:// journals.flvc.org/FLAIRS/article/view/130723 doi: 10.32473/flairs.v35i.130723

Gebru, T., Morgenstern, J., Vecchione, B., Vaughan, J. W., Wallach, H., Daum´e III, H., & Crawford, K. (2021, December). Datasheets for Datasets. arXiv. Retrieved 2022-10-20, from http://arxiv.org/abs/1803.09010 (arXiv:1803.09010 [cs])

Ghasemi, A., & Zahediasl, S. (2012, December). Normality Tests for Statistical Analysis: A Guide for NonStatisticians. International Journal of Endocrinology and Metabolism, 10 (2), 486–489. Retrieved 2023-01-11, from https://brief.land/ijem/articles/71904.html doi: 10.5812/ijem.3505

Gorelick, M., & Ozsvald, I. (2020). High performance Python: practical performant programming for humans (Second edition ed.). Beijing [China] ; Boston [MA]: O’Reilly.

Harper, F. M., & Konstan, J. A. (2016, January). The MovieLens Datasets: History and Context. ACM Transactions on Interactive Intelligent Systems, 5 (4), 1–19. Retrieved 2022-12-29, from https://dl.acm.org/doi/10.1145/ 2827872 doi: 10.1145/2827872

Harris, C. R., Millman, K. J., van der Walt, S. J., Gommers, R., Virtanen, P., Cournapeau, D., . . . Oliphant, T. E. (2020, September). Array programming with NumPy. Nature, 585 (7825), 357–362. Retrieved 2021-12-27, from https://www.nature.com/articles/s41586-020-2649-2 doi: 10.1038/s41586-020-2649-2

Jannach, D., & Adomavicius, G. (2016, September). Recommendations with a Purpose. In Proceedings of the 10th ACM Conference on Recommender Systems (pp. 7–10). Boston Massachusetts USA: ACM. Retrieved 2022-09-01, from https://dl.acm.org/doi/10.1145/2959100.2959186 doi: 10.1145/2959100.2959186

Jiang, R., Chiappa, S., Lattimore, T., Gy¨orgy, A., & Kohli, P. (2019, January). Degenerate Feedback Loops in Recommender Systems. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society (pp. 383–390). Honolulu HI USA: ACM. Retrieved 2022-09-07, from https://dl.acm.org/doi/10.1145/3306618 .3314288 doi: 10.1145/3306618.3314288

Kipf, T. N., & Welling, M. (2017, February). Semi-Supervised Classification with Graph Convolutional Networks. arXiv. Retrieved 2022-12-14, from http://arxiv.org/abs/1609.02907 (arXiv:1609.02907 [cs, stat])

Knijnenburg, B. P., Reijmer, N. J., & Willemsen, M. C. (2011, October). Each to his own: how different users call for different interaction methods in recommender systems. In Proceedings of the fifth ACM conference on Recommender systems (pp. 141–148). New York, NY, USA: Association for Computing Machinery. Retrieved 2023-01-11, from https://doi.org/10.1145/2043932.2043960 doi: 10.1145/2043932.2043960

Kop, M. (2021, November). EU Artificial Intelligence Act: The European Approach to AI. , 11.

Kuznetsova, A., Brockhoff, P. B., & Christensen, R. H. B. (2017). lmerTest package: tests in linear mixed effects models. Journal of statistical software, 82 (13). (Publisher: The Foundation for Open Access Statistics)

Lam, S. K., Pitrou, A., & Seibert, S. (2015). Numba: A llvm-based python jit compiler. In Proceedings of the Second Workshop on the LLVM Compiler Infrastructure in HPC (pp. 1–6).

Ledwich, M., & Zaitsev, A. (2019). Algorithmic extremism: Examining YouTube’s rabbit hole of radicalization. arXiv preprint arXiv:1912.11211 .

Leino, J., & R¨aih¨a, K.-J. (2007, October). Case amazon: ratings and reviews as part of recommendations. In Proceedings of the 2007 ACM conference on Recommender systems (pp. 137–140). New York, NY, USA: Association for Computing Machinery. Retrieved 2023-01-08, from https://doi.org/10.1145/1297231.1297255 doi: 10.1145/1297231.1297255

Linden, G., Smith, B., & York, J. (2003). Amazon. com recommendations: Item-to-item collaborative filtering. IEEE Internet computing, 7 (1), 76–80. (Publisher: Ieee)

Maddala, G. S. (1971, March). The Use of Variance Components Models in Pooling Cross Section and Time Series Data. Econometrica, 39 (2), 341. Retrieved 2023-01-11, from https://www.jstor.org/stable/1913349 ?origin=crossref doi: 10.2307/1913349

McAuley, J., Targett, C., Shi, Q., & Hengel, A. v. d. (2015, June). Image-based Recommendations on Styles and Substitutes. arXiv. Retrieved 2022-12-29, from http://arxiv.org/abs/1506.04757 (arXiv:1506.04757 [cs])

McKinney, W. (2011). pandas: a foundational Python library for data analysis and statistics. Python for high performance and scientific computing, 14 (9), 1–9. (Publisher: Seattle)

Millecamp, M., Htun, N. N., Jin, Y., & Verbert, K. (2018). Controlling spotify recommendations: effects of personal characteristics on music recommender user interfaces. In Proceedings of the 26th Conference on user modeling, adaptation and personalization (pp. 101–109).

Nakagawa, S., Johnson, P. C., & Schielzeth, H. (2017). The coefficient of determination R 2 and intra-class correlation coefficient from generalized linear mixed-effects models revisited and expanded. Journal of the Royal Society Interface, 14 (134), 20170213. (Publisher: The Royal Society)

Nguyen, T. T., Hui, P.-M., Harper, F. M., Terveen, L., & Konstan, J. A. (2014). Exploring the filter bubble: the effect of using recommender systems on content diversity. In Proceedings of the 23rd international conference on World wide web - WWW ’14 (pp. 677–686). Seoul, Korea: ACM Press. Retrieved 2022-12-31, from http:// dl.acm.org/citation.cfm?doid=2566486.2568012 doi: 10.1145/2566486.2568012

Ni, J., Li, J., & McAuley, J. (2019). Justifying recommendations using distantly-labeled reviews and fine-grained aspects. In Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP) (pp. 188–197).

Nonnecke, B., & Preece, J. (2000). Lurker demographics: Counting the silent. In Proceedings of the SIGCHI conference on Human factors in computing systems (pp. 73–80).

O’Hara, R., & Kotze, J. (2010, January). Do not log-transform count data. Nature Precedings. Retrieved 2023-01-11, from https://www.nature.com/articles/npre.2010.4136.1 doi: 10.1038/npre.2010.4136.1

Pandita, R. (2017). Internet: A change agent an overview of internet penetration & growth across the world. International Journal of Information Dissemination and Technology, 7 (2), 83–91. Retrieved 2023-01-11, from https://www.proquest.com/docview/1920419032/abstract/14B253C0DD6A4277PQ/1 (Num Pages: 83-91 Place: Ambala, India Publisher: Maharishi Markandehwar University)

Papakyriakopoulos, O., Serrano, J. C. M., & Hegelich, S. (2020). Political communication on social media: A tale of hyperactive users and bias in recommender systems. Online Social Networks and Media, 15 , 100058. (Publisher: Elsevier)

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., . . . Dubourg, V. (2011). Scikit-learn: Machine learning in Python. the Journal of machine Learning research, 12 , 2825–2830. (Publisher: JMLR. org)

Poole, M. A., & O’Farrell, P. N. (1971). The assumptions of the linear regression model. Transactions of the Institute of British Geographers, 145–158. (Publisher: JSTOR)

Popper, K. R. (1970). Normal science and its dangers. Cambridge University Press Cambridge.

Rakova, B., & Chowdhury, R. (2019, September). Human self-determination within algorithmic sociotechnical systems. arXiv. Retrieved 2022-09-07, from http://arxiv.org/abs/1909.06713 (arXiv:1909.06713 [cs])

Raschka, S., Patterson, J., & Nolet, C. (2020). Machine learning in python: Main developments and technology trends in data science, machine learning, and artificial intelligence. Information, 11 (4), 193. (Publisher: Multidisciplinary Digital Publishing Institute)

Roski, J., Maier, E. J., Vigilante, K., Kane, E. A., & Matheny, M. E. (2021). Enhancing trust in AI through industry self-governance. Journal of the American Medical Informatics Association, 28 (7), 1582–1590. (Publisher: Oxford University Press)

Sandvig, C., Hamilton, K., Karahalios, K., & Langbort, C. (2014). Auditing algorithms: Research methods for detecting discrimination on internet platforms. Data and discrimination: converting critical concerns into productive inquiry, 22 , 4349–4357.

Satterthwaite, F. E. (1946, December). An Approximate Distribution of Estimates of Variance Components. Biometrics Bulletin, 2 (6), 110. Retrieved 2023-01-11, from https://www.jstor.org/stable/10.2307/3002019 ?origin=crossref doi: 10.2307/3002019

Schielzeth, H., Dingemanse, N. J., Nakagawa, S., Westneat, D. F., Allegue, H., Teplitsky, C., . . . Araya-Ajoy, Y. G. (2020, September). Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in Ecology and Evolution, 11 (9), 1141–1152. Retrieved 2022-12-27, from https://onlinelibrary.wiley.com/ doi/10.1111/2041-210X.13434 doi: 10.1111/2041-210X.13434

Selbst, A. D., Boyd, D., Friedler, S. A., Venkatasubramanian, S., & Vertesi, J. (2019, January). Fairness and Abstraction in Sociotechnical Systems. In Proceedings of the Conference on Fairness, Accountability, and Transparency (pp. 59–68). Atlanta GA USA: ACM. Retrieved 2022-09-07, from https://dl.acm.org/doi/10.1145/3287560.3287598 doi: 10.1145/3287560.3287598

Smith, B., & Linden, G. (2017, May). Two Decades of Recommender Systems at Amazon.com. IEEE Internet Computing, 21 (3), 12–18. Retrieved 2022-12-20, from http://ieeexplore.ieee.org/document/7927889/ doi: 10.1109/MIC.2017.72

Timmers, H., & Wagenaar, W. A. (1977, November). Inverse statistics and misperception of exponential growth. Perception & Psychophysics, 21 (6), 558–562. Retrieved 2023-01-10, from https://doi.org/10.3758/BF03198737 doi: 10.3758/BF03198737

Van Rossum, G. (2007). Python Programming language. In USENIX annual technical conference (Vol. 41, pp. 1–36). (Issue: 1)

Villadsen, A. R., & Wulff, J. N. (2021, July). Statistical Myths About Log-Transformed Dependent Variables and How to Better Estimate Exponential Models. British Journal of Management, 32 (3), 779–796. Retrieved 2023-01-12, from https://onlinelibrary.wiley.com/doi/10.1111/1467-8551.12431 doi: 10.1111/1467-8551.12431

Wells, J. R., Danskin, G., & Ellsworth, G. (2018). Amazon. com, 2018. Harvard Business School Case Study(716-402).

Wood, S. N., Pya, N., & S¨afken, B. (2016, October). Smoothing Parameter and Model Selection for General Smooth Models. Journal of the American Statistical Association, 111 (516), 1548–1563. Retrieved 2023- 01-12, from https://doi.org/10.1080/01621459.2016.1180986 (Publisher: Taylor & Francis eprint: https://doi.org/10.1080/01621459.2016.1180986) doi: 10.1080/01621459.2016.1180986

Zhu, F., & Liu, Q. (2018). Competing with complementors: An empirical look at Amazon.com. Strategic Management Journal, 39 (10), 2618–2642. Retrieved 2023-01-11, from https://onlinelibrary.wiley.com/doi/ abs/10.1002/smj.2932 ( eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1002/smj.2932) doi: 10.1002/ smj.2932

Zuboff, S. (2019). The Age of Surveillance Capitalism: The Fight for a Human Future at the New Frontier of Power: Barack Obama’s Books of 2019. Profile Books.

This paper is available on arxiv under CC 4.0 license.