Despite the huge advances in technologies that have enabled cheaper, more routine and widespread use of genomics in healthcare, the datasets that underpin the majority of genomic insights, remain built on those  dominated by individuals from Western countries, of European ancestry. The research and clinical impacts of this vary and include: misclassification of variant pathogenicitymore Variants of Unknown Significance (VUSs) in non-Europeans or potential inequalities driven by polygenic score-informed screening, developed on non-representative populations. The need to diversify genomic data and improve the evidence-base for genomics-enabled personalised medicine goes beyond merely collecting more data, but draws in a range of ethical, legal and social issues including but not limited to: consent, societal implications, implications regarding health inequity and personal responsibility, language and concepts, potential beneficiaries, public-private partnerships and data governance. 

This context underpins the rationale behind the creation of Diverse Data,  a new initiative led by Genomics England that aims to reduce inequalities in genomic medicine, and improve genomic outcomes for underserved communities. It also drives home the centrality of embedding ethics into all aspects of initiatives that aim to improve equity and diversity in genomics, to ensure that any efforts are as robust as possible in their aims, governance, design, and delivery. 

There are of course, huge benefits of setting up initiatives from scratch; you benefit from advances in technology in a space where a few years makes all the difference, you can make new and fresh collaborations and connections as “the new kid on the block”, and you can stand on the shoulders of global efforts by learning from the insights and hard work of others from previous years and decades. “The road to hell is paved with good intentions”, or more more recently said and specifically related to genomics, Adam Rutherford in his latest book Control: The Dark History and Troubling Present of Eugenics wrote:  “We must always expect science to be misrepresented, overstated and misunderstood, because it is complex, because the data is unending, and because people are strange”. Whilst the ambitions of the Diverse Data initiative are well-intentioned, there are many many ways in which such a programme could not do good, or even worse, cause harm. 

Some examples of important ethical questions that were raised early for us include:

  • The true or the perceived risk of creating a link between race and genomics, including the potential for this work to be misconstrued as giving scientific backing to such a link. 
  • The risk of misunderstanding health inequalities by focusing on genomic diversity in the absence of wider socio-economic factors such as deprivation
  • The potential negative impacts of new, additional burden from personalised medicine, in the face of poor health outcomes better addressed through other means, largely affecting social determinants of health. 
  • The potential unforeseen or unintended consequence of using a consent model that is not adapted or designed to be culturally tailored for participants

In order to have the broadest and most systematic understanding of potential risks, the Diverse Data initiative commissioned a review into the potential unintended consequences of efforts to diversify genomic data, and we were thrilled that Clinical Ethics, Law and Society at the University of Southampton (CELS) and the Centre for Personalised Medicine at the University of Oxford (CPM), from a hugely rich and high-quality number of submissions, undertook the project. 

The CELS/CPM team worked at full speed, reviewing the literature, running workshops, conducting interviews and supporting a deliberative conversation between researchers and participants.

Here are the key takeaways:

  1. Research practices are often  exclusionary

Many research practices are exclusionary and need to change.  Examples include approaches to recruitment or data collection that do not consider the cultural setting in which potential participants are situated. Research also often lacks reflexivity about diversity on the part of researchers and research institutions.

  1. Co-design is key

Co-design is key to identifying and avoiding potential problems around data diversification. This requires an understanding of the concerns of underserved individuals and communities regarding exploitation and stigmatisation, as well as issues of data ownership and sovereignty. Without attention to group as well as individual concerns, participant engagement may become tokenistic which in turn risks exacerbating existing, as well as creating new, inequalities.

  1. It is crucial to contextualise these efforts within wider structural issues 

There are wider structural issues that influence researchers’ and participants’ attempts to generate diverse data. For example, (a) some researchers view data as neutral, but this ignores the social construction of data and technologies, and their tendencies to reflect societal inequalities. (b). Efforts to diversify data should be contextualised within the historical trajectory of structural racism and legacies of colonialism. (c) Classification and categorisation of populations have political consequences and need to be closely interrogated.

  1. Conclusion

The review concluded that it is important to move actions beyond the recruitment of individuals from under-represented groups as the endpoint. Having more diverse datasets is not inherently more ethical. This is because if the broader historical, political, legal or social factors that shape the environments of potential participants are ignored, the risk of existing inequities being exacerbated remains high. These were extensively detailed as well as key recommendations for the Diverse Data initiative as it embarked on an ambitious, but ethically complex path. 

These recommendations included:

  • Positioning ethics at the forefront of research and through, but in doing so, ensuring that it is seen more than just gaining research or institutional review board approval;
  • Co-design should be at the heart of these initiatives, with participants as active researchers and knowledge makers;
  • Shifting the focus from diversifying data towards enabling diverse ways of knowledge making, which must be largely driven by enabling a research culture that incentivises diversity through all aspects of the research process
  • And of course, a re-iteration that data diversification needs to consider  the broader socio-political context.

We will be working on publishing numerous outputs spawned from this work over the coming year, including research papers that deep-dive into specific themes, as well as the publication of a roadmap for the practical implementation of ethics in data diversity activities so that initiatives, just like Diverse Data at Genomics England, can have a blue-print of what good practice might look like for us all to aim for. In addition this work has already helped the Diverse Data develop an ethics agenda and roadmap for the initiative, guide the design of future ethics work, and shape design elements of the programme.

Full report here.

If you’re interested in collaborating, or want to see early drafts of this emerging work, don’t hesitate to get in touch at and