Critical use of GenAI: Advice
Asking machines without understanding for advice sounds like a terrible idea, but I have to admit it often works well. I personally use LLMs for software support all the time. Using chatbots for troubleshooting installations is a smoother experience than reading documentation and if it doesn't work, iterative prompting usually gets me to the right place. While I realise this might not generalise to other domains, I can certainly see why people use it to get feedback on drafts, plan their exercise routines, receive guidance on study planning and check the quality of a project plan.
The Brachman category of Advice is a thorny one. We seek advice if we do not trust our own judgment – or maybe if we can't be bothered to put in the effort – and that does not put us in the right position to critically evaluate the advice we receive. So how could students engage critically with this particular use of GenAI?
One way to look at this is to compare GenAI to experts. If you want people to think critically about opinions and advice handed out by experts, you remind them that any expert should be able to properly explain their position. We are all epistemically vigilant and can discern between good and bad explanations. It is perfectly fine to recognise the authority of an expert but to also trust them to the extent that their explanations make sense to you.
Of course, LLMs are not experts, they are bullshitters. Many of the signals we can pick up if humans are talking nonsense are simply not available when interacting with a chatbot. However, there are still some ways to probe justifications and those are important for this category of use cases.
Brachman et al. distinguish between Improve, Guidance and Validation in their ontology of GenAI use (Brachman et al., 2024). Let's explore these different subcategories and see how students can maintain a critical attitude for each of them.
Improve
I suspect it is quite common for students to submit a text to an LLM for feedback or improvement. The uncritical way for them to do this would be to just accept all changes. A more deliberate version would be to look at the suggested changes and categorise them in one of three buckets:
- A change in style or tone
- A change in clarity or structure
- A change in meaning
Then, for each bucket, they could ask themselves whether they agree with the change. Is it stylistically better? Is it indeed clearer? Why is that? And if there is change in meaning, is that a fix or did the suggestion instead break intended (and accurate) meaning?
Using these different buckets helps to ask the right questions about the feedback. It is also helpful to focus on the important stuff – not fully agreeing with a stylistic change might be less problematic than having GenAI change the meaning of a text.
To truly remain mindful, a good practice would be to reject at least 20% of the changes. While this might lead to rejection of good advice, it helps avoid automation bias by forcing a careful look at the quality of the feedback.
This approach works for writing feedback, but similar approaches should also be possible for other types of improvement requests.
Guidance
When asked for guidance, an LLM is probably going to pick the beaten path. Humans do this, too, and you can get them to think a bit more creatively by asking them under what circumstances their guidance would fail, or whether they know of any examples where somebody did the exact opposite of their advice and still succeeded.
Similar to when we discussed critical questioning of data analysis, such attempts to increase response variation from LLMs can be very useful for critical thinking. A student receiving a range of options and responses will be brought to think about which advice makes the most sense. In contrast to humans, such questioning does not give insight into an actual reasoning mechanism inside the LLMs, but it does give food for thought. In addition, seeing the "failure modes" associated with the LLM advice gives students counterarguments that probably did not immediately appear from the 'helpful' demeanour of a chatbot.
Validation
Validation is checking "whether an artefact satisfies a set of rules or constraints". In a sense, the only thing a critical questioner would need to do is ask the question "Really?" for each validation provided by GenAI. Iterative prompting or using multiple models can also be a way to identify weaknesses in what is essentially data analysis. If different models give different answers or if any given model changes its output if prompted multiple times, the validation is not reliable. If this happens to particular rules or constraints, it is worthwhile to manually explore whether the artefact complies or not, and to feed those manual results into the LLM.
Not a sentient being
An advice that goes for every use case, but which is especially relevant if you are using an LLM to play a mentoring role, is to remind oneself that an LLM is not a sentient being. It is not a machine that understands what is being asked. It is associating with prompts and that might not bring forward the meaningful reflection that is appropriate for guidance. This is why exploring the variation in responses and being generally sceptical makes a lot of sense. Any student using GenAI for advice should be careful to also use their own judgment and, if in doubt, seek advice from an actual human.
References
Brachman, M., El-Ashry, A., Dugan, C., & Geyer, W. (2024, May). How knowledge workers use and want to use LLMs in an enterprise context. In Extended Abstracts of the CHI Conference on Human Factors in Computing Systems (pp. 1-8).
Others in this series




Member discussion