Penny for your Chain-of-Thoughts

This started as an exercise for one of my AI classes at UNT for using Chain-of-Thought prompting for better AI responses and I thought it was an interesting compare/contrast between Microsoft’s Copilot (which is running ChatGPT-4) and OpenAI’s ChatGPT-4 itself. Surprise! They are “almost” the same… but not quite.
Here is an example of using Chain-of-Thought (CoT) prompting with Microsoft’s Copilot running the GPT-4 architecture and comparing those responses to the OpenAI ChatGPT-4 model. Instead of a sequence of screenshots, I recorded my screen and made a video with my comments in context.
Using Copilot – https://copilot.cloud.microsoft
Reflections and screen capture – https://av.c3eo.us/watch/cTnZfznh9ev
Using ChatGPT – https://chatgpt.com
Reflections and screen capture – https://av.c3eo.us/watch/cTnZhgnh9lC
Both videos are available from the media page if these direct links are updated and broken.
I found that both AIs are using a subset of the ChatGPT-4 LLM engine. While they both returned correct answers, from their point of view (POV), each of them had a slightly different POV but they justified the origin of that POV.
Based on the examples that I used on both Copilot and ChatGPT, I found that the zero-shot and few-shot responses were no different in either AI.
While I understand how sample size can impact the accuracy and resolution of responses as explained in the Nyquist–Shannon (Keim, 2020) sampling theorem (Ozgur, 2024), I’m curious if the exponential growth of AI training data in just the short time between when the lecture and reading materials for this assignment were conducted (between 2022 with Suzgun (Suzgun et al., 2022), Min (Min et al., 2022), and Wei (Wei et al., 2022), 2023 with Li (Li et al., 2023)) and now in early 2025, has just given AI a better baseline working knowledge. For example, ChatGPT 3 as referenced by Wei (Wei et al., 2022) has been replaced with several iterations of GPT-4 now as evidenced in my videos. Maybe for these simple questions, there is little to no difference between the zero-shot and the few-shot response because AI is growing and evolving. After all, isn’t that what we hope for AI to do?
I think that a reference thought process will be needed for more complex questions. But just like young kids today take to digital devices much faster than young kids when I was a kid, it seems like AI is growing up right in front of our very eyes.
Further comments and reflections are in context in the videos linked above.
References
Keim, R. (2020, May 6). The Nyquist–Shannon Theorem: Understanding Sampled Systems. Retrieved from All About Circuits: https://www.allaboutcircuits.com/technical-articles/nyquist-shannon-theorem-understanding-sampled-systems
Li, C., Wang, J., Zhang, Y., Zhu, K., Hou, W., Lian, J., . . . Xie, X. (2023). Large Language Models Understand and Can Be Enhanced by Emotional Stimuli. arXiv. Retrieved from https://arxiv.org/abs/2307.11760
Min, S., Lyu, X., Holtzman, A., Artetxe, M., Lewis, M., Hajishirzi, H., & Zettlemoyer, L. (2022). Rethinking the Role of Demonstrations: What Makes In-Context Learning Work? arXiv. https://arxiv.org/abs/2202.12837
Ozgur, A. (2024, May 2). ENGR76 Lectures 9-10: Sampling Theorem. Retrieved from Stanford University : https://web.stanford.edu/class/engr76/lectures/lecture9-10.pdf
Suzgun, M., Scales, N., Schärli, N., Gehrmann, S., Tay, Y., Chung, H. W., . . . Wei, J. (2022). Challenging BIG-Bench tasks and whether chain-of-thought can solve them. arXiv. https://arxiv.org/abs/2210.09261
Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., . . . Zhou, D. (2022). Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. 36th Conference on Neural Information Processing Systems. https://arxiv.org/abs/2201.11903