Ethical Considerations of Using GenAI Tools
Generative AI (GenAI) tools are becoming increasingly popular for a wide variety of uses, including in classrooms. Whether you’re generating images, building slides, or creating summaries of readings, it’s important to be thoughtful about the tools you’re using and the impact they can have on both your students and our world as a whole.
Bias
A GenAI tool is only as good as its training data; if that data contains content that is racist or sexist, we shouldn’t be surprised when the GenAI tool develops the same kind of bias. Bias can come in a variety of different types: stereotypical, gender, and political. All of these biases can lead to certain groups being inaccurately featured more or less in outputs.
Bloomberg tested the biases present in the Stable Diffusion text-to-image generator in 2023. When they prompted the model to create representations for jobs that were considered “high-paying” and “low-paying,” the images generated of high-paying jobs were typically of people with lighter skin tones. People with darker skin tones featured more prominently in images of low-paying jobs. Bloomberg found similar results when they looked at the gender of the people in the images. Stable Diffusion generated three images of men for one image of a woman. When women did appear in the generated images, they were typically in lower-paying and more traditional roles, like housekeeper. Prompts for jobs like “politician,” “lawyer,” “judge,” and “CEO” led to images that were almost entirely light-skinned men.
Harmful Content
Besides being biased, GenAI can produce content that is harmful in a variety of ways. GenAI can hallucinate content that is not based on actual data, and is instead fictitious or unrealistic. It can be used to produce artificial video or audio content impersonating a person’s likeness. When this kind of video and audio content is done with permission of the person, it’s commonly called “synthetic media.” When people create artificial video or audio content of someone without their permission, it’s referred to as a “deep-fake.” Deep-fakes are often used to harass, humiliate, and spread hate-speech. GenAI has made the creation of deep-fakes easy and cheap, and there have been several high-profile cases in the US and Europe of children and women being abused through their creation.
Policymaking efforts to combat the proliferation of and harm caused by deep fakes have become common both in the U.S. and abroad, with proposals often including disclosure requirements for the use of synthetic media, at least for certain activities. While educational uses of these technologies are unlikely to be restricted or banned, users should strongly consider disclosing the use of these technologies by default in the interest of transparency and in anticipation of any future requirements to do so that may apply. It may also be worthwhile to consider whether companies offering these products are well positioned to comply with this quickly evolving regulatory landscape as well as whether they are making reasonable efforts to help prevent the misuse of their products.
Data
The collection of data used to train GenAI models can raise a variety of privacy concerns, particularly around personal and proprietary data. Some personal data collection can be declined, although the methods of how to do so are often buried in lengthy terms of service that most users don’t read. Those terms of service also cover how the GenAI tool can use the data that you put into the tool via prompting, so you should be cognizant of the kind of information you’re feeding it.
Recently, the Cisco 2024 Data Privacy Benchmark Study revealed that most organizations are limiting the use of GenAI, with some banning it entirely, because of data privacy and security issues. This is likely because 48% of employees surveyed admitted to entering non-public company information into GenAI tools. There’s also a general lack of transparency around what kinds of data sets have been used to train GenAI tools. Although some explicitly state where their training data comes from, many are vague about what the training data was and how they accessed it.
Copyright
Right now, many believe that using content, like books, images, and videos, to train GenAI falls under fair use in the U.S., but there are currently multiple lawsuits challenging this notion. If companies are unable to leverage fair use to acquire training data, the effectiveness and availability of GenAI is likely to decrease dramatically. The cost of obtaining licenses for the incredible amount of data needed will likely drive all but the biggest companies out of the market.
The outputs created by GenAI can have their own copyright issues, depending on how much they pull from the training data. If the image generated by GenAI, for example, is substantially similar to an image in the training data, there could potentially be some liability for copyright infringement if or when the image is used. Many GenAI tools are attempting to avoid this by refusing to generate content that is similar to copyrighted material, but there are ways for creative prompters to get around these restrictions.
Although many GenAI tools claim to be trained on openly licensed content, studies show that when asked about licensing requirements, 70% of the tools didn’t specify what license requirements were for the generated work, and if they did, the tool often provided a more permissive license than what the original creator intended.
The use of GenAI brings up ethical issues around authorship that are often related to copyright but are separate. For example, when using information gathered from GenAI, there may be an ethical obligation to cite the original source to avoid claims of plagiarism. GenAI doesn’t typically provide citations, and when it does, those citations are frequently incorrect. There are also concerns about the displacement of human authors and artists by GenAI; this frequently comes up when GenAI is used to create works in the style of certain artists or authors.
Environmental Impact
GenAI has a huge environmental impact. Research has shown that training the early chatbots, such as GPT-3, produced as much greenhouse gas as a gasoline powered vehicle driving for 1 million miles. Generating one image using GenAI uses as much energy as fully charging your phone. ChatGPT alone consumes the same amount of energy as a small town every day. On top of that, the data centers needed to house the training data and infrastructure for these tools require large amounts of electricity and water to keep them from overheating. Right now, it’s nearly impossible to accurately evaluate or know the full extent of the environmental impacts of GenAI.
Equity
There are a variety of different types of equity concerns when it comes to GenAI. Most GenAI tools are trained on data from data rich languages and are less likely to include non-standard dialects or languages. There are also access and efficacy disparities. Not everyone will have access to GenAI tools, whether it’s because of the cost, a lack of internet access, or because there are accessibility issues with the tool. Underrepresented or underserved groups may find their experiences missing from the training data, which is only optimized for some groups, not all, limiting the efficacy of the outputs.
Finally, it’s important to remember that all of the legal and ethical issues discussed so far have a disproportionate effect on marginalized groups. For example, negative environmental effects tend to be felt the worst in more vulnerable communities. Considering the major impact GenAI has on the environment, how are we going to work with these groups to help ensure they’re not further harmed?
Conclusion
Overall, there are pretty significant legal and ethical issues we should consider before using GenAI tools. This doesn’t mean that we shouldn’t use GenAI tools; it means that we should be thoughtful about when, how, and why we’re using them. And we should know that the way we use them might change in the not so distant future. The current lawsuits will take years to work their way through the legal system, and depending on how they shake out, GenAI tools may have to go through some major changes when it comes to their training data.
Here are five tips for navigating through these complex issues:
- Investigate the reputation of the GenAI tool and the company that created it. Perform an online search for any potential legal or ethical issues. Add search terms like “complaint,” “violation,” or “lawsuit” with the company’s name, and be sure to read product reviews.
- Check the terms of service. Review the terms of service and privacy policies before using GenAl. Caution should be taken before publishing materials created through GenAI.
- Protect sensitive data. In addition to data shared for training purposes, it should be assumed, unless otherwise stated, that data shared when using GenAI tools will be accessible by the third party tool provider and affiliates. Data sharing must adhere to U-M policies.
- Consider the ethics/limitations. Continue to remember, and remind your students, that GenAI tools are often biased, as the technology is designed to output common results based on its learning model. GenAI can also “hallucinate,” so specific claims should always be verified before sharing.
- Consult resources and ask for help. We are still swimming in uncharted waters. Utilize resources available here at U-M, including training and workshops on GenAI that are hosted across U-M. There is also a new GenAI as a Learning Design Partner series led by U-M instructors that is freely available via Coursera.