Skip to main content
Advertisement
  • Loading metrics

Synthetic data and ELSI-focused computational checklists—A survey of biomedical professionals’ views

  • Jennifer K. Wagner ,

    Roles Conceptualization, Formal analysis, Funding acquisition, Investigation, Methodology, Writing – original draft, Writing – review & editing

    jkw131@psu.edu

    Affiliations School of Engineering Design and Innovation, Penn State University, University Park, Pennsylvania, United States of America, Department of Anthropology, Penn State University, University Park, Pennsylvania, United States of America, Department of Biomedical Engineering, Penn State University, University Park, Pennsylvania, United States of America, Institute for Computational and Data Sciences, Penn State University, University Park, Pennsylvania, United States of America, Huck Institutes of the Life Sciences, Penn State University, University Park, Pennsylvania, United States of America, Rock Ethics Institute, Penn State University, University Park, Pennsylvania, United States of America, Penn State Law, University Park, Pennsylvania, United States of America

  • Laura Y. Cabrera,

    Roles Conceptualization, Formal analysis, Investigation, Methodology, Writing – original draft, Writing – review & editing

    Affiliations Huck Institutes of the Life Sciences, Penn State University, University Park, Pennsylvania, United States of America, Rock Ethics Institute, Penn State University, University Park, Pennsylvania, United States of America, Department of Engineering Science and Mechanics, Penn State University University Park, Pennsylvania, United States of America, Department of Philosophy, Penn State University, University Park, Pennsylvania, United States of America, Bioethics Program, Penn State University, University Park, Pennsylvania, United States of America

  • Sara Gerke,

    Roles Conceptualization, Methodology, Writing – review & editing

    Affiliations Penn State Dickinson Law, Carlisle, Pennsylvania, United States of America, University of Illinois Urbana-Champaign, College of Law, Champaign, Illinois, United States of America

  • Daniel Susser

    Roles Conceptualization, Methodology, Writing – review & editing

    Affiliation Department of Information Science, Cornell University, Ithaca, New York, United States of America

Abstract

Artificial intelligence (AI) and machine learning (ML) tools are now proliferating in biomedical contexts, and there is no sign this will slow down any time soon. AI/ML and related technologies promise to improve scientific understanding of health and disease and have the potential to spur the development of innovative and effective diagnostics, treatments, cures, and medical technologies. Concerns about AI/ML are prominent, but attention to two specific aspects of AI/ML have so far received little research attention: synthetic data and computational checklists that might promote not only the reproducibility of AI/ML tools but also increased attention to ethical, legal, and social implications (ELSI) of AI/ML tools. We administered a targeted survey to explore these two items among biomedical professionals in the United States. Our survey findings suggest that there is a gap in familiarity with both synthetic data and computational checklists among AI/ML users and developers and those in ethics-related positions who might be tasked with ensuring the proper use or oversight of AI/ML tools. The findings from this survey study underscore the need for additional ELSI research on synthetic data and computational checklists to inform escalating efforts, including the establishment of laws and policies, to ensure safe, effective, and ethical use of AI in health settings.

Author summary

Many efforts are underway to promote ethical and trustworthy AI/ML for biomedical research and care, including the BRIDGE2AI initiative supported by the National Institutes of Health (NIH). We conducted an exploratory survey of biomedical professionals in the United States to guage their perspectives on two specific matters that had received little attention to date: synthetic data and checklists focused on ethical, legal, and social implications (ELSI) of AI/ML. We found a preference for actual data over synthetic data and general interest in the possible utility of ELSI-focused computational checklists to increase attention given to important matters (such as addressing bias, preserving privacy, and promoting transparency of AI/ML tools). Our survey findings highlight the need for increased awareness among biomedical professionals on these matters. They also underscore the importance for more ELSI research on AI/ML to inform the growing policy efforts underway within the United States and around the world to ensure that biomedical uses of AI are safe, effective, ethical, and trustworthy.

Introduction

Artificial intelligence (AI), machine learning (ML), and related technological advances in data sciences are disrupting society generally and healthcare specifically at a dizzying pace. Regulatory agencies, including the Food and Drug Administration (FDA), have been exploring and developing approaches to AI/ML oversight within existing medical device regulatory frameworks for several years [e.g, 18]. However, with the issuance of the Blueprint for an AI Bill of Rights in October 2022 by the White House Office of Science and Technology Policy [9] and with the Executive Order on AI signed by President Biden in October 2023 [10], administrative agencies and private entities are jumping to action to accelerate AI for biomedical purposes while trying to ensure the technologies are developed and deployed responsibly and ethically. The Executive Order emphasizes that potential harms caused or exacerbated by AI including (but not limited to) privacy, bias, and discrimination must be mitigated in healthcare contexts. Among the many mandates of the Executive Order is the establishment of an “HHS AI Task Force” to create, within the next year, a strategic plan—along with policies, frameworks, appropriate regulatory action, guidance, and resources—for AI and AI-enabled technologies in the health sector. The Executive Order further requires the Secretary of Health and Human Services to develop an AI assurance policy, to take action to ensure appropriate understanding of and compliance with federal nondiscrimination laws, establish an AI safety program, develop a regulatory strategy for the use of AI in drug development, and continue research funding for AI-related work (such as the AIM-AHEAD program [11]).

One dimension of the AI/ML story that deserves more attention is the use of synthetic data (SD), which has been of increasing interest for precision medicine with several distinct use cases [12]. SD has been characterized as having particular value in biomedical science because it could help overcome institutional and regulatory obstacles to sharing actual patient data, preserve privacy, and reduce “time to insights” [13]. There is no consensus definition of SD [e.g., 14], but one working definition describes SD as “data that has been generated using a purpose-built mathematical model or algorithm, with the aim of solving a (set of) data science task(s)” [15]. There has been little empirical research on how health scientists and practitioners understand the meaning, potential, and limitations of synthetic data; however, given the “high stakes” nature of healthcare applications,[14] making sure everyone—whether biomedical AI/ML users (e.g., scientistists and clinicians) or those responsible for ensuring adequate patient-participant protections are in place (e.g., ethicists)—is on the same page is critical. Communication and explanation of SD is important both prior to the use of SD (e.g., within informed consent processes) and afterwards when reporting results (e.g., research findings or healthcare outcomes), and many audiences and their corresponding knowledge gaps and perspectives of SD must be considered (including, e.g., academic scholars, journalists, corporate executives, policymakers, patients, communities, trainees/students, and the general public). Misunderstandings could lead to both unjustified reliance on or rejection of SD, impeding the realization of its benefits or risking avoidable harm.

Addressing SD-related ethical, legal, and social issues is challenging, at least in part because of the definitional problems, as some scholars have already begun noting SD sits in a “regulatory blind spot” [14]. As with AI/ML more broadly, there are many advocates for soft law or collaborative governance approaches [e.g., 16] that leverage existing laws along with professional norms and who call on professional societies and members of the technology industry to help set expectations for conduct. Recently, leading members of the technology industry announced their willingness to adhere to voluntary guidelines for AI/ML [17] and even called for legislative reform [e.g., [18]] to bring clarity where there is current legal and regulatory uncertainty over responsibilities. Benchmarking [e.g., [19]] and computational checklists have been viewed as an attractive option to encourage data scientists developing, improving, and using AI/ML models to approach their work thoughtfully and to disclose metadata that could, among other desirable purposes, facilitate reproducibility. Integrating components of ELSI (i.e., ethical, legal, and social issues) into AI/ML computational checklists has been suggested as having potential benefits [2022]. But what issues ELSI-focused computational checklists might reasonably address, how they might work in practice, and what might impede or facilitate their adoption has not received much research attention.

To help uncover ways in which ethical and trustworthy biomedical AI/ML could be advanced (within and beyond the BRIDGE2AI initiative, an NIH Common Fund program intended to address these complex issues and develop best practices [23]), we undertook a suite of complementary efforts, including qualitative interview research and convening a national academic work group. Here, we report on the exploratory survey research we conducted to surface perspectives among biomedical professionals about synthetic data and ELSI-focused computational checklists.

Methods

A survey instrument was developed and refined based on preliminary findings from key informant interviews (which are under review elsewhere) before being programmed into Qualtrics (Qualtrics LLC; Provo, UT, USA) for online administration. The integration of Tango Rewards Genius enabled delivery of a $25.00 research incentive for completion of the survey to those respondents who wished to receive it. The survey instrument (S1 Appendix) included an information page before survey participants could proceed to the 30 questions of the survey, which consisted of a series of questions about the participants and substantive questions about their perspectives regarding (a) synthetic data (which was not defined in the instrument), (b) ethics-focused computational checklists for AI/ML, and (c) AI in society generally. The survey included both closed and open questions. The activities involved in this study (“Exploring the Ethics of Synthetic Data and Artificial Intelligence,” Study00020918) were determined to be exempt research by the Institutional Review Board at Penn State University on August 30, 2022.

A survey recruitment pool was created by identifying adults in the United States with relevant professional expertise who appear to be employed by an organizational member of the Association of American Medical Colleges (AAMC) [2425]. Relevant expertise in AI or AI-related ethics was determined based on employment in a relevant role (based on title or description) suggesting that the individual might be an AI/ML developer or user (such as a data scientist, data engineer, informaticist, chief information officer, etc.) or might be involved with AI/ML ethics, policy, or oversight (such as an ethicist, institutional review board member, chief bioethics officer, ethics researcher, etc.). A random sampling of the 544 organizational members of the AAMC (comprising hospital/health systems and medical schools) was taken, and, from those AAMC member organizations, six (6) employees (comprising approximately an even distribution of roles, AI and AI ethics) were identified from publicly accessible information (such as employee directories and other organizational webpages, press releases, published literature, etc.). A modified Dillman approach [26] was used to administer the survey in which individuals in the recruitment pool were contacted directly by the study team using a series of three email messages: (1) a pre-survey message sent in advance to notify individuals of the forthcoming survey, (2) a message with a URL to the online survey, and (3) a reminder message that again included a URL to the online survey. Initial recruitment messages were sent to N = 771 individuals across 146 organizations. There were N = 49 instances of failed email delivery. Survey responses were collected between June 16, 2023 and August 28, 2023.

Our goal was to collect a sample of N = 250 responses to have sufficient power to perform simple comparisons (e.g., comparisons between subgroups of n = 50 would provide 80% statistical power at alpha = 0.05 to detect relatively small effect sizes of 0.23). Ultimately, however, we were performing this survey for a modest purpose: to augment the perspectives of the members of the academic working group (consisting of eight scholars in addition to the study team) convened to explore these issues. Thus, even a small sample size would have data adequacy [e.g., [2728]] for purposes of issue spotting and generating hypotheses appropriate for subsequent AI ethics research. Microsoft Excel was used to calculate descriptive statistics and conduct Chi-square tests of independence. Chi-square tests of independence were performed only for a subset of questions to assess statistically significant differences in responses among AI developers/users and AI ethicists. For those analyses, the respondents who did not self-categorize into either group were excluded.

Results

A total of 82 adults in the United States with relevant expertise responded to our survey; however, seven (7) of the participants failed to respond to any of the substantive questions, and those survey responses were excluded from analysis. Thus, the survey sample for analysis consisted of responses from 75 adults in the United States with relevant expertise, reflecting an overall survey response rate of 10.4% (75/722). Three (3) respondents completed the questions about themselves and synthetic data but then abandoned the survey before the questions pertaining to computational checklists and AI generally. Those were retained for analysis. The mean time to completion of the survey was 18.9 minutes. A descriptive summary of our survey participants’ characteristics is displayed in Table 1. While the sample is diverse with regard to age, gender identity, workplace type, professional role, years of professional experience, and geographic region, the sample consists predominantly of individuals who reported being highly educated (96% reporting an advanced degree) and White (71.2%).

Familiarity with Subject Matter

Despite being employed at institutions and in professional roles in which we could reasonably expect individuals to encounter synthetic data and computational checklists, respondents indicated limited familiarity with either. As shown in Fig 1, a majority of respondents (53.3%, 40/75) indicated they were either not at all or slightly familiar with synthetic data, and only 16.0% (12/75) indicated they were very or extremely familiar with synthetic data. The remainder (30.7%, 23/75) indicated they were moderately familiar with synthetic data. Similarly, respondents were overwhelmingly unfamiliar with computational checklists, as 70.8% (51/72) of respondents indicated they were either not at all or slightly familiar, 11% (8/72) of respondents indicated they were moderately familiar, and 18.1% (13/72) of respondents indicated they were either very or extremely familiar with computational checklists. As shown in Figs 2 and 3, familiarity with synthetic data and computational checklists varies by professional role. While one-quarter of respondents with an ethics-related role (26%, 7/27) expressed they were not at all familiar with synthetic data, all respondents with an AI/ML developer or user role were at least slightly familiar with synthetic data, and more than 40% (41.6%, 10/24) reported being either very or extremely familiar with synthetic data. A null hypothesis of independence of professional role and familiarity with synthetic data was rejected at α = 0.05 (χ2 = 17.2; df = 4; p-value = 0.00179), and a similar null hypothesis of independence of professional role and familiarity with computational checklists was rejected at α = 0.05 (χ2 = 11.0; df = 4; p-value = 0.02622).

thumbnail
Fig 1. Familiarity with Synthetic Data and/or Computational Checklists.

Illustrated is the reported familiarity with these items for the entire survey sample (N = 75). Dark blue shading refers to familiarity with synthetic data, and light blue shading refers to familiarity with computational checklists.

https://doi.org/10.1371/journal.pdig.0000666.g001

thumbnail
Fig 2. Familiarity with Synthetic Data By Professional Role.

A comparison of familiarity among AI/ML developers/users (N = 24) and AI/ML-related ethicists (N = 27) with synthetic data is shown. Dark blue shading indicates no familiarity; orange shading indicates slight familiarity; grey shading indicates moderate familiarity; yellow shading indicates very familiar responses; and light blue shading indicates extreme familiarity.

https://doi.org/10.1371/journal.pdig.0000666.g002

thumbnail
Fig 3. Familiarity with Computational Checklists By Professional Role.

A comparison of familiarity among AI/ML developers/users (N = 23) and AI/ML-related ethicists (N = 26) with computational checklists is shown. Dark blue shading indicates no familiarity; orange shading indicates slight familiarity; grey shading indicates moderate familiarity; yellow shading indicates very familiar responses; and light blue shading indicates extreme familiarity.

https://doi.org/10.1371/journal.pdig.0000666.g003

Results Specifically Regarding Synthetic Data

A majority of respondents (57% or 43/75) indicated they had neither a favorable nor unfavorable opinion of synthetic data. A null hypothesis of independence of professional role and (un)favorable opinion of synthetic data was not rejected at α = 0.05 (χ2 = 3.89; df = 4; p-value = 0.42134). Two-thirds of respondents (66.7%, 50/75) indicated that they generally preferred actual, real data over synthetic data, only one respondent indicated a general preference for synthetic data, and the remainder (32%, 24/75) reported no preference. Regarding possible benefits of synthetic data (Figs 4 and 5), a majority of responses suggested that synthetic data could “somewhat” address health information privacy concerns (57.3%, 43/75) and bias concerns (70.7%, 53/75).

thumbnail
Fig 4. Perspectives Regarding Whether Synthetic Data Addresses Privacy Concerns.

Perspectives of all respondents (N = 75) on the extent to which a possible benefit of SD is that it is able to address privacy concerns are shown, with the perspective “not at all” in dark blue, “somewhat” in orange, “mostly” in grey, and “completely” in yellow.

https://doi.org/10.1371/journal.pdig.0000666.g004

thumbnail
Fig 5. Perspectives Regarding Whether Synthetic Data Addresses Bias Concerns.

Perspectives of all respondents (N = 75) on the extent to which a possible benefit of SD is that it is able to address bias are shown, with the perspective “not at all” in dark blue, “somewhat” in orange, “mostly” in grey, and “completely” in yellow.

https://doi.org/10.1371/journal.pdig.0000666.g005

Respondents were divided on whether an institutional review board (IRB) should oversee use of synthetic data in biomedical contexts (Fig 6), with 9.3% (7/75) indicating IRBs never should be involved, 24.0% (18/75) indicating IRBs always should be involved, 41.3% (31/75) indicating IRBs sometimes should be involved, and 25.3% (19/75) being unsure. A null hypothesis of independence of professional role and perspectives about IRB oversight of synthetic data was not rejected at α = 0.05 (χ2 = 0.49; df = 3; p-value = 0.92044). Respondents who elaborated on the reasons for their perspective regarding IRB oversight were diverse, with several questioning whether IRBs had the appropriate expertise or authority. Several remarked that synthetic data “doesn’t belong to a human subject”; are not “about human subjects”; or are “not real data” and, as such, indicated the IRB lacks authority. Some noted oversight might be important even if research with synthetic data is exempt from Common Rule regulation (e.g., “because of the downstream implications for humans”), while others were concerned about IRB overreach. Yet other respondents indicated that IRBs could have a role in the generation of synthetic data but not in their uses. A few noted whether IRB oversight is appropriate depends upon the method used for generating the synthetic data and whether synthetic data are the only data involved in the activity.

thumbnail
Fig 6. Perspectives Regarding Whether IRBs Should Oversee the Use of Synthetic Data in Biomedical Contexts.

Perspectives of all respondents (N = 75) on the extent to which IRBs should be involved in oversight of biomedical uses of SD are shown, with the perspective “never” in dark blue, “sometimes” in orange, “always” in grey, and “unsure” in yellow.

https://doi.org/10.1371/journal.pdig.0000666.g006

Of those who expressed an oversight role for the IRB, points raised included “the integrity of science,” the validity of the synthetic data and avoiding “manipulation of data,” whether proper steps to address bias have been taken, privacy implications for individuals and groups (such as inadvertent representation or disclosure of protected health information), and appropriate presentation of findings to avoid misleading others. For example, one respondent (who was a self-described biomedical scientist with more than 10 years of experience) noted, “IRB’s [sic] are in a unique position to flag inconsistencies with real world data and, in my opinion, would be the entity with enough authority to do something about erroneous data” while another respondent (who self-categorized as an ethicist with more than 10 years of experience) explained, “I honestly do not know exactly what ‘synthetic data’ are. Sounds like research fraud to me!” While some respondents expressed a protectionist view that it is better to err on the side of unnecessary oversight for emerging technologies (e.g., “…safer to have the IRB overseeing any field involving new techs such as synthetic data”), others expressed opposition to perceived unnecessary interference of IRBs (e.g., “Synthetic data broadens access by AI/ML developers. IRB oversight creates additional barriers for AI/ML developers.”). Additionally, two respondents who both supported a role for IRBs explained as follows:

“Because human eyes have to look at it. We are not all robots yet.”

“Because there should be someone who looks at what is exactly the PI and the research team trying to do in terms of Human research. The term ‘synthetic data’ is too loosely utilized and how are the ‘synthetic data’ produced can border inferring wrong conclusions based on artificially created data or research misconduct if the wrong data fields are ‘synthetically’ produced and then utilized in the analyses. It will not be easy for the IRB boards to identify the specialists who can look carefully at the proposals that include these ‘synthetic data’ proposals. Also very specific guidelines have to be given to the PIs when they prepare proposals that utilize these techniques.”

Most respondents indicated the quality of synthetic data might be difficult to determine (82.7%, 62/75), and a majority of respondents indicated researchers using synthetic data might not disclose that their studies rely on synthetic data (65.3%, 49/75), synthetic data might exacerbate data inequities (65.3%, 49/75), and that synthetic data might disincentivize researchers from engaging individuals, groups, and communities who are underrepresented (50.7%, 38/75). A sizable minority of respondents also indicated synthetic data might cause problems with accountability (45.3%, 34/75), synthetic data might be used to evade human subjects research protections (42.7%, 32/75), and synthetic data uses might disincentivize researchers from returning study findings to individuals, groups, and communities (42.7%, 32/43).

Results Specifically Regarding ELSI-focused computational checklists

Respondents were overwhelmingly supportive of the idea of creating and using computational checklists for ethics-related aspects of AI models and data sets, with 92.8% of respondents (65/70) indicating their support (52.9%, 37/70) or strong support (40.0%, 28/70) as shown in Fig 7. From a prepopulated list of eight items, respondents ranked their top three items by importance for inclusion in an ELSI-focused computational checklist, as shown in Fig 8. The item most commonly ranked in the top three (whether in the #1, #2, or #3 ranking) was steps taken to reduce bias in the AI model or dataset (74.7%, 56/75) followed by characteristics of the individuals or groups used to train the AI model or dataset (65.3%, 49/75) and also by a tie (32.0%, 24/75, mean rank) between steps taken to preserve information privacy and security in the design of the AI model or dataset and steps be taken to ensure that access to the AI model or dataset is equitable.

thumbnail
Fig 7. Initial Opinion of ELSI-Focused Computational Checklists.

Support for and opposition to the notion of ELSI-focused computational checklists among respondents (N = 70) is shown, with support and strong support displayed in increasing shades of blue and opposition and strong opposition in increasing shades of orange.

https://doi.org/10.1371/journal.pdig.0000666.g007

thumbnail
Fig 8. Ranking Items (by Importance) for Inclusion in an ELSI-Focused Computational Checklist.

Respondents’ (N = 75) prioritization of the top three items for inclusion in an ELSI-focused computational checklists are shown. Each item appears as a different color. Steps taken to reduce bias and characteristics of those whose data were used to train the AI model or dataset were most ranked in the top three, followed by a tie among steps taken to preserve information privacy and security in the design of the AI model or dataset and steps taken to ensure that access to the AI model or dataset is equitable.

https://doi.org/10.1371/journal.pdig.0000666.g008

Respondents’ perceptions are generally that ethics-focused computational checklists would increase the quality of attention given to ethical dimensions of AI models and datasets (73.9%, 51/69), as shown in Fig 9. A majority of respondents (55.7%, 39/70) expressed support (48.6%, 34/70) or strong support (7.1%, 5/70) for the idea of automated processes to validate disclosures of ethics-focused features of AI models and datasets. While support for such automated validation processes appeared to vary by professional role, as shown in Fig 10, a null hypothesis of independence of professional role and support for automated processes to validate disclosures of ethics-focused features of AI models and datasets was not rejected at α = 0.05 (χ2 = 2.73; df = 1; p-value = 0.09840).

thumbnail
Fig 9. Perceived Effect of ELSI-Focused Computational Checklist on the Quality of Attention Given to ELSI-related Issues for AI models and datasets.

Respondents’ (N = 69) perception that an ELSI-focused computational checklist would increase the quality of attention given to those matters appears in dark blue, that it would have no effect appears in grey, and that it would decrease the quality of attention appears in dark orange.

https://doi.org/10.1371/journal.pdig.0000666.g009

thumbnail
Fig 10. Support for Automated Processes to Validate ELSI-Focused Features of AI Models and Datasets.

A comparison of respondents’ (N = 49) support for and opposition to automated processes to validate ELSI-focused features among AI/ML developers/users (N = 23) and AI/ML-related ethicists (N = 26) is displayed, with support and strong support displayed in increasing shades of blue and opposition and strong opposition in increasing shades of orange.

https://doi.org/10.1371/journal.pdig.0000666.g010

A majority of respondents indicated concerns regarding who might enforce adherence to an ELSI-focused computational checklist (68.0%, 51/75), indicated it is unclear how an ELSI-focused computational checklist would be validated or interpreted (60.0%, 45/75), and indicated that the consequences of failing to use an ELSI-focused computational checklist are unclear (50.7%, 38/75). Respondents were divided as to whether the likelihood of adoption was unclear (49.3%, 37/75). Around one-third of respondents (36.0%, 27/75) expressed concern that a checklist approach to ELSI issues would feel like “ethics washing” (i.e., a “rubber stamp” on ethics or a meaningless compliance exercise). Other concerns were less commonly reported by respondents, such as concerns about liability issues (13.3%, 10/75) and concerns about the number of checklists already in existence to consider (16.0%, 12/75).

Generally speaking, roughly half of respondents had a favorable opinion of AI (47.9%, 34/71).

Discussion

The weak familiarity with either synthetic data or computational checklists reported by respondents to this survey could, at first glance, be viewed as a survey limitation; however, we find this to be a striking result. That individuals employed at AAMC member institutions and holding professional positions reasonably anticipated to involve encounters with AI/ML (either as those likely to use or observe AI/ML in practice or as those responsible for oversight of the use of such AI/ML tools) lack familiarity with the subject matter foci of this survey warrants further examination, as it raises urgent questions about whether biomedical professionals are ready for the widespread integration of AI/ML tools. Another explanation, however, is that with the number of AI/ML tools expanding, it might be increasingly unreasonable to expect everyone to be familiar with each. Training beyond those involved with large-scale NIH-supported efforts (such as the BRIDGE2AI initiative [29]) is needed—particularly if the gap in familiarity that was observed between those self-reporting as AI/ML developers or users (or similar) and those self-reporting as ethicists (or similar) is to be closed.

That said, while we do not view the weak familiarity with synthetic data and computational checklists as a limitation of this survey, there are limitations of the survey that do warrant caution in order to avoid over-interpretation and unsupported generalizations. (1) The survey was not generally administered but, rather, was administered to a target population marked by high educational attainment. (2) The sample of survey respondents contains very little racial and ethnic diversity. (3) While responses were gathered from across the United States, there was limited response from several states (particularly those located in the west region). (4) The completion rate for our exploratory survey was 10.4% (75/722), so there is a possibility of non-response bias. Nevertheless, the results from this survey are useful in providing a foundation upon which further ELSI research can build—whether on the specific areas of focus explored here or on the ample other AI/ML topics that, to date, also have not been given adequate scholarly attention.

Synthetic data

Regarding synthetic data, our findings show that respondents, regardless of their professional role in biomedicine, generally see the potential value of synthetic data in being one way to address health information privacy concerns as well as concerns about bias. The observed preference for non-synthetic (actual) data over synthetic data, which has also been observed elsewhere by others [12], might simply be an artifact of respondents’ general lack of familiarity with synthetic data as opposed to a deliberate, calculated weighing of relevant factors; however, additional research examining why this preference exists and what factors would influence this preference is needed. Additional research also is needed to better understand the possible relationship and interacting forces between various motivating factors (e.g., pressure to reduce bias in available datasets, pressure to use actual non-synthetic data, or pressure to protect information privacy) and perspectives of and practices with synthetic data.

The broad divide that we observed regarding the potential role of IRBs when synthetic data are used in biomedical contexts seems to be due to several mutually compatible reasons (as suggested by the open-ended explanations we elicited from respondents). Many explanations seemed to reflect a broad interest in respecting or preserving the limitations on IRB authority (i.e., protection of human subjects) and also calibrating IRB oversight to risks involved in the contextualized use of synthetic data (e.g., whether and how synthetic data is being generated and whether already generated synthetic data are being used in particular ways, with particular people, and in particular circumstances).

Transparency is key and needs to be further explored. Currently there is a lack of clear instructions or norms regarding how best to address the generation or use of synthetic data—whether at the time a project involving synthetic data has begun (i.e., disclosure within an institution, such as an IRB or perhaps a data ethics committee) or at the time a project’s outcomes are being reported (i.e., disclosure to others, such as peer reviewers for a journal). The survey responses we collected—combined with insights we gleaned from interviews with key informants reported elsewhere—suggest that synthetic data might be “under the radar” for many biomedical professionals and that guidance is needed to encourage transparency (e.g., how SD were generated, who generated the SD, how well SD performs relative to actual data, etc.) and promote appropriate consideration at different stages of research and/or implementation. Whether this is the job of an IRB is part of that discussion, but it seems apparent that there needs to be a “guardian” of some form to ensure adequate consideration of the ethical issues raised by synthetic data in the biomedical context.

Furthermore, given the lack of familiarity with synthetic data that ethics professionals in biomedical organizations have, as was shown by this survey, there seems to be a serious need for ethnographical work to illuminate the synthetic data practices that are underway so that there is sufficient understanding for a rigorous analysis of the ethical, legal, and social implications to be performed.

ELSI-focused computational checklists

Regarding the potential utility of ELSI-focused computational checklists to promote responsible biomedical AI/ML, it was a surprising result that our survey respondents had such little familiarity with computational checklists. Computational checklists, such as those used to promote reproducibility [3031], are already present in data science and healthcare fields—even if not focused on ethical, legal, and social issues per se. Nevertheless, survey respondents (regardless of professional background) expressed a positive initial opinion for using ELSI-focused computational checklists and thought such tools would increase attention to ELSI details. However, the careful design of such checklists and checklist procedures will be critical to prevent “ethics washing”, to encourage thoughtful engagement with social and ethical issues, and to enable nuanced, context-specific disclosures that are shielded from unjustifiably rigid or binary (right/wrong) judgments and shallow criticisms. Our survey did not reveal any obvious consensus among biomedical professionals regarding the items that should be prioritized, although the survey results generally support the notion that priorities should include efforts to reduce bias, promote transparency regarding diversity of actual data used to train the AI model or datasets, preserve privacy, and ensure equitable access to the AI/ML tools developed. While our survey did not examine the underlying motivating or contributing factors influencing prioritization of items for ELSI-focused computational checklists, it would be important to gain such deeper insights as they could ultimately affect user “buy in.” Who would enforce ELSI-focused computational checklists—and how—are important questions that remain open to further discussion as well.

Conclusion

To conclude, over the past year, we have undertaken several activities to help inform the in-depth efforts of those involved with the ELSI Core for the BRIDGE2AI project and inspire others to dedicate further attention to these matters. Here, we reported the results of our exploratory survey. We separately conducted qualitative research involving key informant interviews, convened a diverse working group of experts intended to help address some of these issues and provide recommendations for policy and practice [32], and hosted a symposium towards responsible biomedical AI [33]. These collective efforts have made it clear that more research is needed, in particular as it relates to preferences for actual versus synthetic data, the role of IRB or similar bodies to provide oversight, and strategies to mitigate some of the challenges our participants highlighted. Moreover, it is clear that increased awareness of SD and computational checklists among biomedical professionals and beyond is needed if one expects to advance the field in a responsible manner. While time and labor intensive, ethnographic work to understand current (and shifting) behaviors, decisions, and practices is critically needed to move beyond surface-level observations and make impactful contributions toward responsible biomedical AI. Such work would provide important substance for consideration by the growing number of groups working to shape policy around the world (e.g., the United Nations AI Advisory Body empaneled to improve global AI governance [34]; the Coalition of Health AI intended to “build a consensus-driven framework” that will produce “guidelines and guardrails” for responsible health AI systems; and the group summoned by the National Academy of Medicine’s Leadership Consortium to develop a code of conduct for healthcare AI [35]).

While synthetic data might currently be a “relatively niche pursuit” [36], the funding opportunities to accelerate AI/ML generally and synthetic data specifically (including but not limited to its use in digital twins and the beginnings of data scientists creating longitudinal synthetic health record datasets [3738]) and the suggestion by some that synthetic data will ultimately replace actual patient data sets [39] in the near future indicate that additional ELSI research on synthetic data practices and perspectives would be worthwhile alongside technical research. Moreover, we join the voices advocating for human-centered design and the inclusion of diverse perspectives (including specifically through interdisciplinary collaboration) [e.g., [4042]] to ensure that AI technologies are developed, used, and refined in ways that both advance health equity and facilitate ethical reflection. We also underscore the need for further ELSI research offering practical insights to help shape emerging codes of ethics and conduct for healthcare AI [4344] as well as emerging sociotechnical standards such as those supported by IEEE working groups [45].

Supporting information

Acknowledgments

The authors are grateful to Dr. Alex Bui (UCLA) for his encouragement to pursue this bioethics supplement to the “PREMIERE: A PREdictive Model Index and Exchange Repository.” The authors thank students Morgan Schnars, Samiksha Kature, and Pingxu Hao for assistance with the survey recruitment pool and thank students Pingxu Hao and Khalid Smith for assistance engaging the relevant literature over the project’s duration. The authors appreciate the thoughtful discussions of the academic work group convened as part of this project over the past year (involving Daniel Schiff, I. Glenn Cohen, Megan Doerr, Jordan Harrod, Jasmine McNealy, Michelle N. Meyer, W. Nicholson Price, II, and Kristin Kostick Quenet) and appreciate the important contextual information about PREMIERE provided during working group meetings by post-doctoral scholar Dr. Anders Garlid.

References

  1. 1. Beaulieu-Jones BK, Finlayson SG, Yuan W, Altman RB, Kohane IS, Prasad V, et al. Examining the Use of Real-World Evidence in the Regulatory Process. Clinical Pharmacology and Therapeutics 2020 Apr;107(4):843–852. Epub 2019 Nov 14. pmid:31562770; PMCID: PMC7093234.
  2. 2. Gerke S, Babic B, Evgeniou T, Cohen IG. The need for a system view to regulate artificial intelligence/machine learning-based software as medical device. NPJ Digit Med. 2020 Apr 7;3:53. pmid:32285013; PMCID: PMC7138819.
  3. 3. Gerke S, Minssen T and Cohen IG. Ethical and legal challenges of artificial intelligence-driven healthcare. Artificial Intelligence in Healthcare 2020; 295–336.
  4. 4. Babic B, Gerke S, Evgeniou T, Cohen IG. Beware explanations from AI in health care. Science 2021; 373(6552): 284–286. pmid:34437144.
  5. 5. Gerke S. “Nutrition Facts Labels” for Artificial Intelligence/Machine Learning-Based Medical Devices—The Urgent Need for Labeling Standards. The George Washington Law Review 2023; 91(1): 79–163.
  6. 6. Gerke S, Health AI for Good Rather than Evil? The Need for a New Regulatory Framework for AI-Based Medical Devices, Yale Journal of Health Policy, Law, and Ethics 2021; 20:433–513.
  7. 7. U.S. Food and Drug Administration (FDA), Proposed Regulatory Framework for Modifications to Artificial Intelligence/Machine Learning (AI/ML)-Based Software as a Medical Device (SaMD): Discussion Paper and Request for Feedback. 2021. Available from: https://www.fda.gov/media/122535/download (Accessed 2023 November 3).
  8. 8. U.S. Food and Drug Administration (FDA), Artificial Intelligence/Machine Learning (AI/ML)-Based Software as a Medical Device (SaMD) Action Plan. 2021. Available from: https://www.fda.gov/media/145022/download (Accessed 2023 November 3).
  9. 9. Office of Science and Technology Policy (OSTP), Blueprint for an AI Bill of Rights. 2022 [Internet] Available from: https://www.whitehouse.gov/ostp/ai-bill-of-rights/ (Accessed 2023 September 26).
  10. 10. Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence. 2023 [Internet] Available from: https://www.whitehouse.gov/briefing-room/presidential-actions/2023/10/30/executive-order-on-the-safe-secure-and-trustworthy-development-and-use-of-artificial-intelligence/ (Accessed 2023 October 31).
  11. 11. NIH Office of Data Science Strategy, About the Artificial Intelligence/Machine Learning Consortium to Advance Health Equity and Researcher Diversity (AIM-AHEAD) Program. [Internet] Available from: https://datascience.nih.gov/artificial-intelligence/aim-ahead (Accessed 2023 October 31).
  12. 12. Gonzales A, Guruswamy G, Smith SR. Synthetic data in health care: A narrative review. PLOS Digit Health. 2023 Jan 6;2(1):e0000082. pmid:36812604; PMCID: PMC9931305.
  13. 13. Foraker RE, Yu SC, Gupta A, Michelson AP, Pineda Soto JA, Colvin R, et al. Spot the difference: comparing results of analyses from real patient data and synthetic derivatives. JAMIA Open. 2020 Dec 14;3(4):557–566. pmid:33623891; PMCID: PMC7886551.
  14. 14. Giuffrè M, Shung DL. Harnessing the power of synthetic data in healthcare: innovation, application, and privacy. NPJ Digit Med. 2023 Oct 9;6(1):186. pmid:37813960; PMCID: PMC10562365.
  15. 15. Jordon J, Szpruch L, Houssau F, Bottarelli M, Cherubin G, Maple C, et al. Synthetic Data–what, why and how? arXiv:2205.03257 [cs.LG], (2022).
  16. 16. Schmit CD, Doerr MJ, Wagner JK. Leveraging IP for AI governance. Science. 2023 Feb 17;379(6633):646–648. Epub 2023 Feb 16. pmid:36795826.
  17. 17. Shear MD, Kang C, and Sanger DE, Pressured by Biden, A.I. Companies Agree to Guardrails on New Tools, New York Times. 2023 Jul. 21. Available from: https://www.nytimes.com/2023/07/21/us/politics/ai-regulation-biden.html (Accessed 2023 October 31).
  18. 18. Jalonick MC and O’Brien M. Tech industry leaders endorse regulating artificial intelligence at rare summit in Washington. AP News. Sep. 13, 2023. Available from: https://apnews.com/article/schumer-artificial-intelligence-elon-musk-senate-efcfb1067d68ad2f595db7e92167943c (Accessed 2023 October 31).
  19. 19. Yan C, Yan Y, Wan Z, Zhang Z, Omberg L, Guinney J, et al. A Multifaceted benchmarking of synthetic electronic health record generation models. Nat Commun. 2022 Dec 9;13(1):7609. pmid:36494374; PMCID: PMC9734113.
  20. 20. Jobin A, Ienca M and Vayena E (2019) The global landscape of AI ethics guidelines. Nature Machine Intelligence 1(9): 389–399
  21. 21. Zhou J and Chen F ( 2022) AI ethics: from principles to practice. AI & SOCIETY: 1–11
  22. 22. American Association for the Advancement of Science (AAAS) (2023) Decision Tree for the Responsible Application of Artificial Intelligence (v1.0). [Internet] Available from: https://www.aaas.org/sites/default/files/2023-08/AAAS%20Decision%20Tree.pdf (Accessed 2023 October 31).
  23. 23. National Institutes of Health (NIH), Program snapshot, 2022. [Internet] Available from: https://commonfund.nih.gov/bridge2ai (Accessed 2023 November 3).
  24. 24. Association of American Medical Colleges (AAMC), AAMC Hospital/Health System Members. 2022 [Internet] Available from: https://members.aamc.org/eweb/DynamicPage.aspx?webcode=AAMCOrgSearchResult&orgtype=Hospital%2FHealth%20System (Accessed 2022 August 16)
  25. 25. Association of American Medical Colleges (AAMC), AAMC Medical School Members. 2022 [Internet] Available from: https://members.aamc.org/eweb/DynamicPage.aspx?webcode=AAMCOrgSearchResult&orgtype=Medical%20School (Accessed 2022 August 16).
  26. 26. Dillman DA. Mail and internet surveys: The tailored design method. 2nd edition. Wiley, New York; 2007.
  27. 27. Sullivan GM, Feinn R. Using Effect Size-or Why the P Value Is Not Enough. J Grad Med Educ. 2012 Sep;4(3):279–82. pmid:23997866; PMCID: PMC3444174
  28. 28. Glenton C, Carlsen B, Lewin S, Munthe-Kaas H, Colvin CJ, Tunçalp Ö, et al. Applying GRADE-CERQual to qualitative evidence synthesis findings-paper 5: how to assess adequacy of data. Implement Sci. 2018 Jan 25;13(Suppl 1):14. pmid:29384077; PMCID: PMC5791045.
  29. 29. National Institutes of Health (NIH). NIH launches Bridge2AI program to expand the use of artificial intelligence in biomedical and behavioral research. 2022 [Internet] Available from: https://www.nih.gov/news-events/news-releases/nih-launches-bridge2ai-program-expand-use-artificial-intelligence-biomedical-behavioral-research (Accessed 2023 September 26).
  30. 30. Nagarajah T and Poravi G. An Extensive Checklist for Building AutoML Systems. AMIR@ECIR. 2019. Available from: https://www.semanticscholar.org/paper/An-Extensive-Checklist-for-Building-AutoML-Systems-Nagarajah-Poravi/104182b8ee37625700a575ba89c584817a270f3f (Accessed 2023 November 3).
  31. 31. Hartley M, Olsson TSG. dtoolAI: Reproducibility for Deep Learning. Patterns (N Y). 2020 Jul 23;1(5):100073. pmid:33205122; PMCID: PMC7660391.
  32. 32. Susser D, Schiff DS, Gerke S, Cabrera LY, Cohen IG, Doerr M, et al. Synthetic Health Data: Real Ethical Promise and Peril. Hastings Center Report. 2024; 54(5): 8-13. pmid:39487776
  33. 33. Symposium Towards Responsible Biomedical AI, [Internet] Available from: https://rockethics.psu.edu/events/a-symposium-towards-responsible-biomedical-ai/ (Accessed 2023 October 31).
  34. 34. United Nations Secretary-General’s High-level Advisory Body on Artificial Intelligence, [Internet] Available from: https://www.un.org/en/ai-advisory-body (Accessed 2023 October 31).
  35. 35. National Academy of Medicine, Toward a Code of Conduct for Artificial Intelligence Used in Health, Health Care, and Biomedical Science. [Internet] Available from: https://nam.edu/programs/value-science-driven-health-care/health-care-artificial-intelligence-code-of-conduct/ (Accessed 2023 October 31).
  36. 36. Savage N. Synthetic data could be better than real data. Nature. 2023 Apr 27. Epub ahead of print. pmid:37106108.
  37. 37. Zhang Z, Yan C, Malin BA. Keeping synthetic patients on track: feedback mechanisms to mitigate performance drift in longitudinal health data simulation. J Am Med Inform Assoc. 2022 Oct 7;29(11):1890–1898. pmid:35927974; PMCID: PMC9552284.
  38. 38. Li J, Cairns BJ, Li J, Zhu T. Generating synthetic mixed-type longitudinal electronic health records for artificial intelligent applications. NPJ Digit Med. 2023 May 27;6(1):98. pmid:37244963; PMCID: PMC10224668.
  39. 39. Arora A. Synthetic data: the future of open-access health-care datasets? Lancet. 2023 Mar 25;401(10381):997. pmid:36965971.
  40. 40. Chen Y, Clayton EW, Novak LL, Anders S, Malin B. Human-Centered Design to Address Biases in Artificial Intelligence. J Med Internet Res. 2023 Mar 24;25:e43251. pmid:36961506; PMCID: PMC10132017.
  41. 41. Dankwa-Mullan I. Health Equity and Ethical Considerations in Using Artificial Intelligence in Public Health and Medicine. Prev Chronic Dis. 2024 Aug 22;21:E64. pmid:39173183; PMCID: PMC11364282.
  42. 42. Schicktanz S, Welsch J, Schweda M, Hein A, Rieger JW, Kirste T. AI-assisted ethics? considerations of AI simulation for the ethical assessment and design of assistive technologies. Front Genet. 2023 Jun 26;14:1039839. pmid:37434952; PMCID: PMC10331421.
  43. 43. Adams L, Fontaine E, Lin S, Crowell T, Chung VCH, and Gonzalez AA, editors. 2024. Artificial intelligence in health, health care and biomedical science: An AI code of conduct framework principles and commitments discussion draft. NAM Perspectives. Commentary, National Academy of Medicine, Washington, DC. https://doi.org/10.31478/202403a
  44. 44. Novakowski A, Doerr M, Duglan D, Varma S. Top leaders in health and technology just developed an AI code of conduct. Here’s how we can build on it. Jun 19, 2024. [Internet] Available from: https://sagebionetworks.pubpub.org/pub/dna357fh (CC-By 4.0) (Accessed 2024 September 15).
  45. 45. IEEE, Synthetic Data. [Internet] Available from https://standards.ieee.org/industry-connections/synthetic-data/ (Accessed 2024 September 15).