Recently I have asked our favorite large language model chatGPT4o to help me with my football betting game. With friends and family we are usually betting during the football (soccer) World or Euro Cup. I wrote a little R/shiny app called SoccerSuccer, and invite my friends and family members to enter their bets for each game. There are compicated rules for points and rankings, some funny visualizations and sophisticated analysis to understand the patterns of bets and points. Eventually we have a little gift for the winners and it’s usually fun for several generations.
There are a few things that requires an improvement including entering all games of the tournament. According to my original plan, I wanted to copy the teams, beginning time and stadium from the respective Wikipedia page. For the current tournament this is the UEFA Euro 2024 page. However, I never managed to automate this process. Usually just a few days before the turnament, I figure out that it’s already too late to catch up with my plan. Then I always find myself manually writing SQL commands for the database like the following:
INSERT INTO game (gameid, team1, team2, city, starttime, kogame) VALUES (1, 'Germany', 'Scotland', 'Fußball Arena München, Munich', '14 June 2024 21:00+2', FALSE);
You don’t need to be a programmer to understand this line of SQL code above. It enters game number 1 Germany vs. Scotland of the UEFA Euro into my little database. I have to be very careful not to mix up the teams or beginning time, otherwise this leads to plenty of follow-up issues. For example, if the competing teams are wrong, players of the betting game could bet for games, that are never played or miss others that are not scheduled in my app.
So I asked my little adviser from OpenAI chatGPT to help me with extracting the game data from the Wikipedia page and provide me correct SQL code for the 36 games of the first round of the Euro Cup 2024. To my surprise, chatGPT immediately understood my request and produced happily plenty of SQL commands for the first 30 games. Line number 1 was identical with my code example (see above). The following lines were syntactically perfectly correct, but the content was wrong. Suddenly teams like Wales and Sweden, which didn’t make it through the qualification where playing in stadiums that were not selected like the Weserstadion in Bremen.
However, if your are not into football, the data of these footballs games looked plausible. Here is an example:
INSERT INTO game (gameid, team1, team2, city, starttime, kogame) VALUES (14, 'Wales', 'Switzerland', 'Weserstadion, Bremen', '19 June 2024 21:00+1', FALSE);
I have a basic understanding of large language models, and belive to understand, why this hallucinations occuded. There is definitly more important information, than football game data. If this technology contributes to the spreading of wrong information, like medical information or wrong rumors, large language models can become pretty dangerous for a few people. Imagine, if you are taking the wrong drug or rumors are damaging your reputation at your business partner.
I asked the following questions to chatGPT and got answers with a lot of food for thought.
- Do you know the austrian lawyer Max Schrems and the dispute about hallucinations of LLMs? You may use the browsing tool.
- Can you estimate, if this issue will limit the usage of LLM in Europe?
Try it out! ChatGPT’s answers sound perfectly plausible.