​How AI really works | Don’t believe what they say about AI | Andriy Burkov
-
In this conversation, Andriy Burkov discusses his journey in AI and machine learning, touching on significant life decisions, the evolution of AI, the limitations of large language models (LLMs), and the challenges posed by hallucinations in AI responses.
-
He emphasizes the importance of data in AI development, the implications of AI agents, and the future of artificial general intelligence (AGI).
-
Additionally, he addresses copyright issues related to AI-generated content and the need for data privacy.
-
The conversation concludes with reflections on the rapid pace of technological change and its impact on society.
-
00:00 Embracing Change: Andri's Journey to Writing
-
05:31 Understanding AI: Definitions and Concepts
-
09:45 The Mechanics of Large Language Models
-
14:01 The Illusion of Intelligence: Hallucinations in AI
-
18:23 Navigating Limitations: Use Cases for AI
-
22:42 Leveraging Internal Data: Challenges and Solutions
-
26:26 The Future of AI: Addressing Quality and Reliability
-
38:36 The Challenge of Distinguishing AI-Generated Content
-
43:17 Copyright Issues in the Age of AI
-
50:28 Data Privacy and LLMs
-
56:25 The Future of AI Agents
-
01:00:46 Predictions for AGI and the Future
-
01:06:17 Reflections on Career and Life Choices
-
01:08:33 Navigating the Digital Future
-
01:12:02 New Chapter
Andriy (00:00) our first idea about anything new will be 99 % it's fake, 99 % it's a bot. Now it's not just information, it's manipulation like on a massive scale across social networks and we consume all of it and the more you consume, the more anxious you become. Iceo technologies (00:17) Andrey Berkov is one of the most influential AI experts in the world. He is the author of several AI bestselling books, like the 100-page Machine Learning book. Andriy (00:27) Those CEOs in their interviews, they don't shy saying that in two years, in five years, it will become so intelligent that people will be out of jobs. I doubt it because they don't have any data or science to confirm these claims. I think that there will be degradation of quality of the web and in the worst case scenario, it will become absolutely unusable and we will be back to our turn this way. There were no Google. It was Alta Vista. Alta Vista, was impossible to use. It was spam after spam after spam. Iceo technologies (00:58) Andrew has a PhD in AI, worked in leading AI roles at multinationals like Gartner, and has over 1 million AI enthusiasts who follow his insights across his popular LinkedIn newsletter and social media platform. Andriy (01:09) If someone says they know how to solve hallucinations problem, it's wrong. People either lie to you for some reason or they don't really understand what they're doing. They always want us to believe that something is more scary than it actually is because if you are scared, you keep consuming this content and they sell... Diego Calligaro (01:29) Hi Andriy, very glad to have you here. How are you? Andriy Burkov (01:32) Hi Diego, glad to be here. I'm doing well, you too? Diego Calligaro (01:35) Very good, very good, thank you. So, Andriy, we have a lot of topics that we'd like to discuss today with you. I would really love to start from a quote from you, from yourself, which I really liked about big decisions that you've made recently. So I'm reading here. your words. And you said that in the life of every person, there are choices that define the rest of it. In my life, there were several such moments, choosing computer science as a major, deciding to learn French, moving to Canada, having kids, doing a PhD, writing my first book. Each of these decisions define who I am today. And you recently decided to resign from a... full-time job and become a professional book writer. So, congratulations for this major decision and tell us, why did you take this decision and how you feel about it? Andriy Burkov (02:33) Yeah, thanks. Thanks for the congratulations. Actually, fun, fun things because I think I wasn't sure how I will feel after I resigned because it's been more than 20 years that I have been a full time employee or full time student and so on. And I wasn't sure whether I will feel like it's always Monday or it's always Friday. And I really happy to to confirm that I really feel like it's always Friday. yeah, so why I decided to do it is because I think it's the best, well, there is never a perfect moment to do anything in life, especially when it's something big, like having kids or buying a house or travel. But I think that it's the best it could be. So my kids are already grown-ups. I have two daughters, one 17 and one 18. So they are studying here in Canada, so it's almost free. So I published my new book recently and it was really well received. So now I have a series. And I decided that this is what I always wanted, know, just to live your life the way you want, you do what you love and you spend time with your family and we will see what will happen. So like, it's just that for more than 20 years since the immigration and children and so on, you know, you... You kind of, don't belong to yourself anymore. You belong to the family and you belong to the future. So you work for it and sometimes you do things not necessarily that you want to do, but you have to do this because you have responsibilities and so on. And I just thought that, okay, I did my part and now I can go back to being. a guy once again and just do what you feel you want to do. And I think that writing, it's my passion. So I decided to give it a chance. And I don't miss a full-time job. I took several contracts as an advisor to several companies that work with AI. And I really like to help. people to start a new company, to figure out how to better position it, what features to develop, where to concentrate resources and so on. And it takes me up to 10 hours a week and the rest of it I can write. And I can not just write, I can code something. I like to tinker with code and I have some ideas. And now with LLMs, It's so simple to get this first working prototype because previously if you wanted to build something, you had to learn a lot. Like for example, if you currently build a web application, you have to learn some web framework for front-end like React, for example, you have to learn backend with Python or JavaScript. there are things to learn. There are databases, security, and multi-processing, lots of stuff. And now you just ask an LLM what you want to have as an application, and very quickly it gives you this. Of course, it will not be a full-featured application, but it's something that is kind of a good starting point. And then you can spend time when you have to tweak it more and make it better and better. like just today, it's the best moment for someone who wanted to kind of do something that they like to do. So I decided to try. Diego Calligaro (06:12) Thanks for sharing and let's jump straight to your field of expertise or a top leader in AI and machine learning. So maybe tell us to start what is AI and what are large language models like Chat GPT Andriy Burkov (06:31) Well, AI, the definition is that when you make computers do something that previously only people were capable of doing, it's called artificial intelligence. So by this standard, even a calculator is artificial intelligence. The only reason why we don't call calculators artificial intelligence anymore is because it's a solved problem. Okay, so like there is nothing you can add to a calculator to make it even better in adding or subtracting numbers. So at some point, work on calculators was considered part of AI, today it isn't. So today what we consider something that computers couldn't do and now they can, it's, for example, heavy conversation. So like just before Chatt GPT, before two years ago, it was hard to imagine a computer that can maintain a consistent multi-term conversation with a person and the person shouldn't choose words right so that the computer understands. Today it's a reality and we still see that computers have, you know, there is room to improvement. So there is a lot of research and engineering that is still being ongoing. So we still consider this AI, but maybe at some point when computers will reach their maximum of conversational capabilities and we will prove that, okay, that there is nowhere else to go, it might become just, you know, part of our normal life. So this is AI. LLMs, it's just... one way of building AI. So it's large language model. So basically it's a neural network that was trained on large quantities of normally text, but today they also can add the pictures, video, sound. So the way it's trained is that you show it a document And usually you split the document into individual pieces that we call tokens, but we might think in terms of words. So you show a document split into a sequences of words, and then you train the parameters of this neural network so that when it reads a sequence of words, you train it to predict what would be the next word in this sequence, most likely. And if the model makes a wrong prediction, you adjust the parameters so that next time for this specific sequence it would make a better prediction. And if you show to this model millions, and today we even talk about billions of documents, then eventually predicting the next word for any sequence for it becomes simple, well, simple in the sense that it has seen so many similar sequences that... Now it almost perfectly predicts what would be the next word for each given sequence. And it just turns out that when you take this model, which is kind of useless because it will just predict the next word, so you show it an example of text and it says the next word will be the. So what would you do with this model? Nothing. the scientists discovered that if you take such a model that was pre-trained on a lot of text, and then you just slightly train it a little bit more, but on examples of conversation. So basically you show it a question, and then you want it to predict the next token in a way that the predicted tokens form an answer. And by showing several examples of multi-turn conversations like this to this model, its parameter adjusts in a way that now it doesn't just predict any next word that makes sense, but any next word that makes sense so that it looks like an answer. And just by chance that it has been discovered that they are actually capable of generating sequences that look like someone is talking to you. But of course, no one is talking to you. It's just the model that was trained to follow patterns. And we people, we are good at patterns. we were created by nature to distinguish smallest changes in facial expressions of people to whom we talk to, because it was important for our survival. So reading And having conversations, again, it's built in in our brain. So when we see a dialogue, we believe that this dialogue is important and we believe that someone is behind the screen and talks to us. But in reality, it's just that the machine is good enough. to reproduce these dialogues so that when we look at them, we believe that we talk to someone. Diego Calligaro (11:24) Yeah, because the LLMs can create a perception that maybe has some sort of intelligence, or they can create, let's say, a new concept while they're actually trained on training data. So the creation of a new concept, from what you're saying, is not possible currently with the LLMs. Andriy Burkov (11:46) Yeah, so as I said, they are pre-trained on a lot of texts. And when I say a lot, like when I say a billion of documents, or I say 100 billion documents, like for you and me, it's the same billion, 100 billions, you know, like even for many people, million and billion, they don't really feel this difference. But in practice, a difference between a million and a hundred billions, It's enormous. It's really like it's hard to even imagine how different these quantities are. So when you show such a huge, even hard to imagine number of documents that currently the humanity produced over these years, especially since the internet, it's just hard for any human being to visualize how many information is contained. inside. when all this information is plugged into a neural network and the neural network managed to fake or simulate a conversation, then for many people it seems like, okay, it knows so many things, no way all of it has been somewhere online. It should be kind of, you know, original, it should be creative, but no, really For every response that you can get from an LLM like ChatGPT, there is some document somewhere responsible for it. And if for some domain of conversations there were not enough documents, so this is where we see hallucinations. So the LLM, there is nothing inside it that makes it distinguish between what it knows and what it doesn't know well. doesn't know at all. So when you ask a question about something it knows, so it has seen in the training data, it will give you good answer most of the time. But if you ask it something to the right or to the left of it and it didn't see it, it will still give you an answer. But this answer will try to fill blanks with some unreliable information because for you, you understand that it doesn't look right. But for it, all parameters within this neural network, they're all the same. So there is no like this parameter is more is more trustable and this one isn't. So they are all the same for the neural network. Diego Calligaro (14:02) So because sometimes the explanation is given also with this high confidence, you know. So you're saying that it doesn't understand whether it's something that knows or doesn't know. So is that the reason why it gives that high confidence with everything, even if sometimes looks like, or not a lie, but an error, Andriy Burkov (14:24) Well, this confidence doesn't come from the neural network architecture per se. This confidence comes from this data that we use to train it to have a conversation. We call this stage fine-tuning. So during fine-tuning, we show a model examples of conversations and the model adjusts its token prediction. to match these examples from the stylistical point of view. So if in your examples, when you provide, when in your training example, someone asks a question and someone answers this question. So in this answer, the text is like, yeah, I see what you mean. Okay, this is what your answer. So the machine, it will learn to use these small phrases like, okay, I understand, or, now I see what you mean. And it's not because it actually sees it, it's just because it follows the patterns that were in this fine tuning data that we human provided to this model to fine tune it to have conversations. at no point in conversation when the model says, "now I'm sure" or, "this is your solution" It actually thinks that it is sure and this is your solution. It just that it follows patterns that were used to fine tune it to have conversations. And even like, because I can go a little bit into more technical details, but a neural network when it predicts the new, the next word, it doesn't just output this word. It outputs a distribution like a probabilistic distribution. over all words of the vocabulary. we, mean, the developers of the chatbot, they usually pick the word with the highest probability in this distribution. So sometimes we use the probability of the word as a confidence that the model gives to this word. So like, for example, if the model says that the next word should be plain, the confidence would be 95 % "plane" and for the rest of the words it will be 0.0000. So even if you take this probability scores as confidence, it doesn't mean that the model is confident about the answer that it's about to give, because the model doesn't even know what will be the second token before it predicts the first. So it would be super sure that the next token would be "plane" but it doesn't know that in the end the conversation will be about ships. It's just that at this specific moment "plane" was the highest, the token was the highest probability. And the more you add those tokens to the answer, the more it adjusts its answer based on the tokens that have already been predicted. This is why, for example, the earliest LLMs, like two years ago, they couldn't correct themselves. Because, for example, you ask it some question and the answer would be, for example, no. But the model chooses the first token as yes. And then it's really hard to eat to... come to a no in the end because it's already started with a yes. So it has to continue answering the question as if it was yes. But today, like many heard, I'm sure that the most capable models today, we call them reasoning models. So those reasoning models, they have this buffer where they can kind of start answering. And then at some point they might say, wait. And after, wait, they try to criticize what was previously said. And after several step of such answer and then, wait, and again answer and then, wait, then in the end it stops. And based on everything that it generated, it gives you the final answer. So that the chances that it will start by yes, by mistake now are smaller. But it's all, you know, I explain it to you just to say how mechanical it's all is, you know, like it's not that the model has a clear idea of the answer and it gives it to you. It's just like, will give you the first word, then we will see what would be the second word, what would the third word. And people, you know, the engineers and scientists working on this model, they just try to find hacks like this reasoning buffer to just, you know, make the outcome closer to the truth than it was without this hack. Diego Calligaro (18:50) And what is like from your side the solution of reducing this hallucination when there is not enough data? Is about expanding data continuously to increase that correctiveness or it can even fight back? Andriy Burkov (19:06) Well, the problem is that these chatbots that we use, people who build them, they pretend that eventually this problem will be fixed. We just need to work a little bit more on it. But in reality if you understand that it's all about the data set, it's not about the model itself, it's not about how you train it. but... the core capabilities there in the data set. So if you think about it this way, then our universe is infinite. even there is an infinity of ways to, I don't know, the infinity of books that we can write about human relationships, because there is always something between two people that is unique to, and different from all other books that describes the relationships of two people. So if you just think about how many different relationships between people can be, between different objects, and each object consists of individual pieces, and they can be combined in a multitude of ways, and each piece is built from molecules and atoms and so on and so on. So what is the chance that you will describe all of it perfectly? in a perfect self-contained data set. Well, I would give zero chance to it because as I said, the universe is infinite. So we cover this part, this part is yet to cover. And the more you try to cover something, the more you see that, okay, now I have to go even a deeper detail because otherwise it's still too superficial. It's like when you learn something, in the beginning you consume. information and you're like, okay, I know more and more. But when you become actually an expert, then you feel like you really, don't even know a one percent of what you could know about this. It's the same about training language models. So there is no way for us to cover everything. So how else we can get rid of hallucinations? Well, we shouldn't use machine learning for this because machine learning it's It's a paradigm in which we train this model. So machine learning paradigm says that you can create a model, for example, a model that distinguishes between dogs and cats, and you can show it several examples of pictures of dogs and pictures of cats. And after a certain number of examples that you have shown, the model will become almost perfect in distinguishing dogs from cats. But then someone will say, I wanted to predict birds too. And they try, they show it a picture of a bird, but it says it's a cat. You show it a picture of another bird, it say it's a dog. And they're like, why does this, Balit? Because it was trained only on pictures of dogs and cats, but you are testing it on pictures of birds. So it's the same about LLMs. So someone will ask, what is one plus one? And it will say two and they might say, wow, it's super cool. It's so accurate and so on. But someone will actually ask something about very specific domain of their expertise and they will see that it cannot answer it because it wasn't trained on this. like machine learning as a paradigm, it's not good for building a true artificial intelligence in the sense that it would. Like for example, what is the difference between an LLM and us in terms of hallucination? Some might say that people hallucinate too, which is true, but we usually call them, we call this mental disorders. We don't say that we hallucinate in person every day, like if they hallucinate every day, they're fine. So of course, we people also have problems, but if you take... a normal, you know, average human being, this person will be able to say, this I know, and this I don't know. So if I'm not the doctor and you ask me whether I have a heart disease, you will say, I'm not the doctor. Okay, so go to see a professional. But the LLM, it will try to answer your question no matter what. So... This is because it's not truly intelligent. It just simulates a conversation. So to answer the question, how, nobody knows. So we just understand that with machine learning paradigm, it cannot be solved. This problem of hallucinations and the problem of overall unreliability of these models. So it should be a different paradigm, not based on machine learning, but rather probably based on creating a of a forever student, okay? Like you create kind of a child and this child learns about the world in which they live by themselves. But we need to invent this learning mechanism because we have different perceptions, capabilities like vision, we can touch, can hear and so on. So we need to give such capabilities to a machine, this is one. And two, we need to create a kind of a black box where those perception information comes in and it leaves some traces in the internal state of this machine. So if let's say you have such a student AI and it lives in some environment, eventually, yeah, we have to simulate curiosity because we are curious by nature. We don't ask ourselves why I'm so curious about music, why someone else is so curious about building bridges and so on. We just say, wow, I like it, I will do it, I will learn everything about it. So we need to integrate this curiosity somehow in this model. So there is so much we don't know how even approach. So today if someone says they know how to solve a hallucinations problem, it's wrong. People either lie to you for some reason or they don't really understand what they're talking Diego Calligaro (24:48) Looking at this limitation, so what do you think are those ideal use cases that the technology can be applied? Andriy Burkov (24:55) Well, there is a lot of cool use cases, but again, one thing that we need to start from is to acknowledge that we don't know actually what is the use case for which the model will work and for which it will not. And why? Because again, it's all around about this data. So this data that these large companies use to train their models, they don't show it. So they keep it hidden and they explain that it's competitive advantage and we don't want to share our secret sauce. But another reason is that if you show what's inside, it will create a lot of problems from the intellectual property perspective because many suspect and some even... admit that they use pirated books to train these models. So we don't know what's inside this dataset. And because of this, we don't know what the model can and cannot do. I think that even for you cannot even tell in general for a language model for what it's good. Every language model was trained on a slightly different dataset. And some might be slightly better in coding because the examples of code were more present in the training set. Or currently there is also this reinforcement learning way to train better coding models. So some companies may maybe spend more time on this reinforcement learning to make it better coder. Some model can answer some question about math. because just this paper on MAT wasn't a training set, but in another model, it wasn't. So basically, it's by trial and error. when I, for example, I use language models to help me code faster, so I can go to one model, and I have a choice today, but it changes every month. So today, have a choice between Claude from Anthropic, Grok from X, and ChatGPT. So, and recently also Gemini from Google. So basically, I come to the first one, I explain what doesn't work in my code, and I say, fix it. It produces some code that is supposed to be fixed. I apply it, I see it doesn't work. I go back and say, doesn't work, and I see this problem. It tries to fix it again. I test again, and it doesn't work again. I will not spend more time with this model because I already see that it struggles. So I will go to the next one. And there are high chances that the next one will be better for this specific problem. But then if you say, OK, this one is really better for code, it's not true. Then you will find another example where the second model will fail, but your first model will solve it. So it's because you don't know what was inside, so you cannot really tell what it is capable of. So for every person, they might find their own tricks or different ways of using them. And the most important for you is to find a not to use them because they are a good way of increasing your productivity. But if you try too hard and it doesn't work, you can end up losing productivity. So like, for example, I might spend maybe five minutes and find the problem in the code myself because I can code. But I spent 30 minutes trying to push this LLM to find this answer for me. So I lost 25 minutes that I could have spent otherwise. So like for every application, the person should decide for themselves where they save time and where they lose time. Diego Calligaro (28:37) And if you can put ourselves in the shoes of a CEO of a company, for example, that want to start using AI in their operation. So what would you see are the main step, ideal ones that should be followed? Andriy Burkov (28:54) Well, the first step is to be realistic that it's not intelligent. So I know that it's hard for many people because they don't know the mechanics of it. I try to explain it in very simple terms, but there is so much noise and other people convincing them that just use AI and it will solve all your problems. So they don't even know to who to believe. So if someone in this situation is listening to me, what I would say is that you should be realistic about whether your data or data with which you work in your business has high presence online. So for example, if let's say for everything you do, you can Google information and find it and you are productive in this way, then chat GPT or whatever LLM will be, most likely will be a high contributor to your efficiency. Because if you can find something online, it's 99 % guaranteed that it was used to train an LLM. So for you to go to Google and spend an hour finding all those documents, reading them and combining them, an LLM can do it in a matter of... minutes. Okay, so of course it will be a significant time saver. But let's say you have a company that repairs tractors, like commercial tractors, and you know that there are just specialized databases that only you and people in your industry use of different pieces that tractors use. And whether they're can be one piece can replace another piece. And to decide whether this piece can replace this piece, you need to look into the properties of the... What is made of? Like what kind of metal is used? whether it's, know, okay, is resistant enough to this pressure and so on. So you, as a specialist, you will look at these numbers and you will make a decision. Yeah, I can replace this piece with this piece. and I can repair this tractor. So if it's some specialized database that require your specialized data that no one else probably except your colleagues in the world know, and you know that nowhere online you will find this information, you really need to go to those specialized resources, the chance that you will have an AI that will help you do this search by itself zero. Okay, so you will not be able to cut to date the advertise. you take this agent and you explain it what what it's supposed to do, and then it will do the work for you know, if the LM has never seen this specialized documents describing different details for for different tractors, you will explain it, you will say Now you will search for the best replacement for a given piece. It will not do it because it doesn't know anything about these pieces for tractors, so it will fail. So, yeah, it's as simple as that. If you think that for being productive, Google is okay, it will do it. If you don't think, it will not do it. So, for example, this is why, for example, translators are currently the most in danger as a profession because the web contains plenty of documents in different languages. So for these models, translating, it's not something that they have never seen. But someone who repairs cars or tractors or ships or build, I don't know, nuclear reactors, There is no way that an AI can do anything meaningful there. Diego Calligaro (32:35) And what about those companies that maybe they have the data internally, so it's not online. How like they should maybe use this data or how should they leverage it? Is there any way? Andriy Burkov (32:47) Well, this is also a problem. You remember probably the hype around Hadoop and big data 15 years ago or so. And at the time it was a hype similar to today's LLM. And people who tried to sell you these technologies and these services, they said, you sit on so much data. and you just need to unlock it. And to unlock this data, you have to pay us. We will install a cluster with Hadoop in your environment. We will put your data there and it will crunch. And after it crunching, then you will get insights and you will beat your competition. It was like this. And today they talk is exactly the same. You have so much internal data, just bring our agents, they will discover everything and they will work. it's not true. what you can do is why this data is so hard to unlock is because it's messy. Usually every company stores their data in a very funny way. Some store everything in an Excel spreadsheet. Some store just is a bunch of file in a large collection, some save some logs to a database. like every company, it's, you know, like it's, if you come to some person to their home, they organize their stuff very differently. So like, it's very hard to create, you know, a robot that will help you with your home because It's all different. Yeah, we can think about, I don't know, robot that's a Roomba vacuum cleaner. Yeah, because the floors, are more or less the same everywhere. But everything above the floor, it's all different. And we, people, sometimes we are weird and we can organize things in a weird ways. It's the same about organizations. So the data is stored in so different ways, formats. It can be complete, incomplete, conflicting. For example, some document can be from 10 years ago and everything now is different, but it describes the system as it was 10 years ago. So when you put some intelligence in this environment, it will... not be able to make sense of it because as I said, first of all, it's not prepared for this kind of information. It was trained on the online data. Now it's exposed to your very specific domain specific data. This is one. And two, there is so many different conflicting documents, Excel spreadsheets, PDF files and so on. So like for this model to be effective, you need to train it on this specific mess. So you need to take this mess and kind of ingest it into this model. Yes, there are ways you can fine tune on some data. For example, there are currently companies that try to build a chat bot for legal data. So they fine tune those models on different court proceedings, different complaints, and so on. and try to build something. If you manage to find a enough quality data set, something good can come out of it. But as I said, if you give it something messy, super dispersed in different formats, it will just confuse the model and nothing good will come out of it. So the problem is this disconnect. There is the outer world, which is more or less standard and... we more or less know how to put information online so that others can consume. And there is this internal world where it's a wild, wild west and yeah, it's hard. Diego Calligaro (36:39) And if you look at the amount of information created by the LLMs as well, are then put online, which is, as you said, sometimes maybe hallucinates, sometimes it's the provider information. And then the same LLMs are trained on this data, which is created by LLMs themselves. That doesn't create the kind of negative loop that just degrades the... the quality essentially produced in the years. Andriy Burkov (37:05) Yeah, this is another problem and so far I didn't see any solution to it. But again, in the past, and let's say in the past, when I say the past, let's say 15, 20, 15 years ago, it has already been predicted by many that, oh, eventually so much false information or so much machine-generated information. will be online and Google will stop being relevant because it will just show you this fake information. But Google somehow managed to not die and more often than not you find the document that you search for, well, assuming that it exists. And I wouldn't say that I spent a lot of time on Google deciding whether it's spam or not spam. Most of the time it's okay. But now it's a kind of a different level of how this fake information can be produced because previously to generate some fake document that just contains a lot of keywords and in reality, it just exists to attract Google and nothing else. So previously to generate it, you needed to program some algorithm that will say, okay, I will take this existing article and I will just plug different fake keywords to it to attract Google. And to detect such content was more or less easy for Google. It's a binary classifier. So basically you show such a fake page and you say it's fake, and then you show a genuine page and you say it's genuine. And AI eventually figures out how to distinguish one from the other. But now this fake article can be generated by a very capable LLM. And you can take the best quality today that has a structure, has tables, you can ask it to put images. So it can create a very plausible looking article again, which is fake. But now it's very hard to train a binary classifier to say this was generated by AI and this wasn't. Why? Because these LLMs, if you remember the definition, they were fine-tuned. So it's an AI on itself. They were fine-tuned to simulate how we people write. And it was... fine tune it on huge number of documents. So it simulates how we write so well that even us, we fall for it. We kind of, wow, it's intelligent. I really like it. So if we as humans cannot distinguish one from another, how you can expect that a machine, you can train a machine to do it because you cannot distinguish how you will say this is spam, this is not spam if you are not sure anymore. So this is where many believe and I think I am more kind of in the middle. So I think that there will be degradation of quality of the web. And in the worst case scenario, it will become absolutely unusable. And we will be back to our 1990s where there were no Google. it was Alta Vista, it was impossible to use. It was spam after spam after spam. This is why Google, for example, become a thing because when they shown their search results, it was night and day compared to Alta Vista. Alta Vista was entirely ruined by spam. So if we, this worst case scenario realizes we have two problems. One, web is unusable and two, the data on the web can no longer be used to train those models. So those models will stop evolving because the companies will be afraid to put any new data in them. And if you don't put new data in them, first of all, it doesn't become smarter. And second, it becomes outdated. So people will say, well, it talks to me that, I don't know, George W. Bush is still American president, but well, let's be realistic. You shouldn't make such mistakes. So yeah, and if this worst-case scenario realizes, what I think will become is that people who had reputation or companies that had reputation, consumers will stick to them more. So they will say, okay, at least this person I follow with them since five, 10 years. And if they post something, I will trust that it's reliable. But this guy, the first time I see him, maybe they are a bot, maybe, you know, today you can generate images, the person can post on their Instagram, and in reality, those images were machine generated. So the economy can become very kind of influencers centered. And if you were... an influencer before, people will come and stay. And for any new people, it will be very hard to become important because our first idea about anything new will be 99 % it's fake, 99 % it's a bot. convincing people to be believed. Diego Calligaro (41:58) Bye. Andriy Burkov (42:09) will be hard. But I think that probably this worst case scenario will not realize because there are other ways of kind of filtering out this fake content because there is some, there is a notion of reputation for different resources. So Google knows a lot about different websites and they might say, well, if we manually check your content, and we find at least 2 % machine generated content, your entire website will be out of our index. And it's a serious threat to an owner of a resource. So they will make sure that no machine generated or no fake information penetrates and ends up on their resources. We don't know how it will play out, but I think that it will be in the middle. Diego Calligaro (42:57) sense. then, about the data, how this company acquired this data. So there has been this trend recently about the Studio Ghibli animation all around, And once you spoke about this, you said that this is probably the largest identity theft in the entire history of art. Can you share with us, you know, What is this copyright issue and what is your stand on this? Andriy Burkov (43:22) Yeah, well, I as an author, I take it very personally. And by the way, all my books are openly available online on their own website. So I don't mind that people make copies of my content. The only thing that I care about is that they don't pretend that it's their content. And for example, I don't care about piracy in general because well, you cannot really stop a person making a copy of your book and then give this copy to their friend. like, should we have a police, you you make a copy and you give it to your friend. Well, people do copies, okay? So it's like a part of our life. But if someone makes a copy of my book and puts it online and sells, then you steal from me because the person who buys this counterfeit copy, they don't know that the author is not rewarded for this. Because like, if you sell a counterfeit and you say this is counterfeit, well, again, if the person decides that they don't care about, you know, rewarding the author, they just want to pay for as cheap as possible, there is nothing I can do to convince them that this is wrong. But usually if you sell counterfeit, people don't know that it's counterfeit and this is really wrong. you lie to the consumer and the consumer thinks that the author is rewarded and the author is not rewarded. So this is wrong, okay? So now about this identity theft. So what is the problem here? The problem is that historically, Like the copyright law is different in different countries. So I will not say that I know all the details of how it was defined where. But overall, the general understanding was in the past is that you cannot make exact copies or you cannot make a copy that is indistinguishable. Like it's different, but to the point of, you know, confusing the source and the copy. So if you do it, it's a copyright infringement. This is why, for example, you cannot, you know, take your favorite Disney character and start making movies about them or animation. Because people will confuse, they will think, okay, it's Disney because, I saw this specific character in a Disney movie and now this new movie comes out so it should be Disney. So again, it's similar to this counterfeit issue. The consumer might think that it's Disney, they go to a Disney, but in the end, people who just copied this character get rewarded and not the creator of this character, for example. but there was a kind of a line here. So you couldn't copy a specific character, but nothing could stop you from copying the overall visual style. for example, if someone decides to make a movie in the style of Lego, but instead of Lego, you use some... Some other toys, okay, but overall the idea is the same. So it's a it's a toy world So no one can tell you no Lego movie was first They made a movie about toys. You can no longer make movies about toys. It would be okay. It's like it's a It's too much. Okay, so you you cannot you cannot be the only person allowed in the world to make movies about toys This is why they say you can copy You cannot copy things but you can copy ideas. So if someone had an idea of a movie about toys, anyone can make movies about toys. But if someone make a movie about this specific toy and you make a movie about their toy, well, this is a problem because now you copy not in kind of a design, but you copy a specific object. And now with these LLMs, this is kind of this idea that you are allowed to copy The style, this is where it gets wrong because previously the style was in a creator's mind. So previously there was a designer and they draw in their style and yes, they learned from the best and they kind of absorbed different techniques that they love. Okay. And now they apply these techniques when they work on their creations. And we call it as a normal creative process or normal scientific process. know, someone invents something, you learn from it, and then you continue using the best of what you learn to create something different in your own way. But now, so people were allowed to reproduce styles and it was okay, but now they created a machine that can do it. And this is entirely different because rights that exist for humans and the right to learn from others it's your fundamental right. This is how our civilization was built. It's one thing. But if you give this right to a machine, it's a different thing. It's like counterfeit. So for example, in my post, what I criticized is that they took the whole collection of one animation studio and they fine-tuned their model so that the model can reproduce this distinctive style and now you can take any picture and transform it into this style. You can draw just on a napkin some picture and then you say, generate a perfect version of it in this style. So this, it's not a designer who learned to work with this style because usually when a person learns something, It's a kind of a combination of different many things that they've learned. They will never reproduce someone's style exactly. They will still bring something from their own. But this machine, it just says, this style sells well. OK, I will just automatically transform my style-less document into this stylish document by taking the style from this studio. Diego Calligaro (49:03) you Andriy Burkov (49:18) So this studio developed their style for decades. So they invested a lot of money and they invested a lot of talent. And now the machine just says, this sells well. Okay, we will generate it for others. It's the same. So the problem now is that we need to revise copyright and say that when a human works in a style similar to another human, it's okay. When a machine... copies automatically the style from someone and sells instead of the creator of its style, it's not okay. But it will take years, maybe decades before we get something, some legislation changed for this. Diego Calligaro (49:59) And if you look at this data, the one that we put directly inside the LLM, so there's the ability to share documents, to share everything. are some LLMs like Gemini, the paid version, they say that they're not using that data to train their model. Others they probably do. How do you say, is the best way to... protect the data that you're sharing with those model if it's your data, private data, essentially. Andriy Burkov (50:29) Well, again, this is a gray zone and those who train models, say they don't violate anyone's copyright because, again, our copyright law wasn't prepared to such mass scale infusion of documents into a model. So they might say, well, we can train our model on the documents because we don't reproduce those documents exactly. So like if you ask an LLM write me the Harry Potter book number two, it will not do it. And it's realistically, like it's not realistically possible because as I said, the LLM when it predicts the next word, it predicts a distribution. So you sample from this distribution. So inevitably some words will be sampled not exactly as it was in the original. So they say, well, yes, we used your document, but we don't reproduce it. So it's not piracy. We don't pretend that we write Harry Potter books. it's not... how they call it counterfeit. So if it's not piracy, it's not counterfeit, then what it is. they kind of show me any legal norm where all of it falls and there is nothing. And usually when there is nothing, it means you can do it because the law exists, like everything which is not forbidden is, you are free to do it. So this is why laws usually forbid rather than allow. Okay? So the copyright law says you cannot do, you cannot do this, you cannot do that, and you cannot do this. And then you say, and for what we do, there is no nothing in the law. So it's okay. So how to fix it? Well, I think you cannot fix it unless you win some court case in the U.S. For example, it's a precedent rule. So if any court decides that it should be done this way, it becomes a law. Okay, so you don't need to write a law because any court decision is equivalent to a law. But maybe it will work in the US this way, in Europe it might work a different way, in Europe maybe they will work a little bit more and they will normalize how they... Because those models, they generate revenue for the companies. and the words that they generate, even if it's not entirely a reproduction of any document, but it's a combination of words coming from different documents. So normally, if you generate some document with this model and the user pays you, so the authors of all those documents whose words you used normally should be rewarded somehow. But again, it's even technically difficult to imagine how you can reward you know, a million others, if you pulled one word from this one and one word from that one, I don't know how it can work. So it's a huge gray zone. So we can criticize, we can say it's morally wrong, but if the law doesn't interdict this specific thing, it's hard to make them stop. Diego Calligaro (53:16) Yeah. And if you look at this data, for example, from the user perspective, for example, I upload a document to an LLM to review it or to summarize it, for example. This data, is it used to train the model or is there are ways to make sure that to keep the data secure Andriy Burkov (53:57) For example, do you think whether Google keeps your search history? Well, you know that it does. And do you worry about it? Or you say, well, it's Google. I kind of trust it and I don't search for, you know, like how to hide the body or something like this. like it's, it's not big deal. So it will be the same about this, this chatbot. So if you don't care, you don't read anything, but if you care, you will not use such public services at all for your queries. And there are alternatives today. Like you can download some open weight models. Like you know, that there are very capable ones like deep sick from from the Chinese but again Because it's Chinese it doesn't mean that you know the Chinese Communist Party Receives all your queries. It's like if you download The entire model like the it's its parameters file to your computer and you run it on your computer It will the data will never leave your computer and it's not just deep sick. There is llama from from Facebook well from meta which is relatively good. There is Quan, it's also Chinese from Alibaba. So there are quite capable alternatives that you can run locally. Of course, running such super large models locally is technically challenging. So like maybe if you're just a regular person, You will not be able to run such a huge model locally. You will probably run something smaller. So the quality of the answers you will receive will be somewhat worse, but you get full privacy. But if you are a company, you can afford, you know, buying or renting a, what we call a node of, it's a kind of a server with multiple GPUs and you can put an entire large model on it and you will get all quality almost as good as you get with popular commercial models. So if you care, just do it locally. If you don't care, well, you can use any model and it will be the same for Diego Calligaro (56:02) And looking at the future, so there's a lot of talk about AI agent, how they're going to change the work. Also, the CEO of NVIDIA, he said that AI agent won't be able to, let's say, do 100 % of the job, but at least 50 % of the job with the people. So what's your stand about? AIA agent, what do you think about this? Andriy Burkov (56:28) Yeah, well, agent, first of all, it's just a rebranding of currently what they call agents. These are just rebranded language models. So what is an agent? An agent is a language model that you instructed to do some task for you. So for example, you say, okay, you are a agent that feeds me news on this topic every day. Okay. So it's just an LLM and you say in the prompt that you wanted to feed you news on a given topic and then you also allow it to use a search engine, for example. when it decides, so the LLM will say, okay, now it's time for me to get you news on this topic. So what it will do? It will say, first of all, I want to execute this search and you connect this request to a search engine. So those search results are downloaded and then the model is presented. So you present these documents to a model, again you put them in the prompt and the model will say, okay, so I think you will be interested in article number three, in article number seven, in article number 22. And in the end, it will say, okay, I will write a short summary of each article and I will send you it by email. So again, it will say, the LLM will generate the command, send this text to this email address. And if you equip your LLM with access to this search engine tool and emailing tool, in the end, you will receive your email with some news that the LLM things you would find interesting. Or it could be different agent. You might say, okay, shop the best dishwasher for me. And it will probably ask you questions like what kind of dishwasher you want, size, is it stainless or plastic and so on. But you have to kind of just instruct it how it should behave. And then again, we go back to our initial discussion. It's all about the data set. so shopping online is described in plenty of documents, how to better shop online, or even that there is a lot of code that implements those shopping agents. So you can download from GitHub. like a Python code or whatever code and it implements, okay, I need to call this function to get the list of products and then I will do some analysis and output the result. So those LLMs have been trained on examples of how to be a shopping agent, how to be an information gathering agent. So for them it's natural. So you ask them and they do. But again, if we go back to our discussion and you want an agent that will find replacement pieces for a tractor, it will fail because it haven't seen anything about tractors or not as much as about dishwashers and microwaves. So yes, in some cases agents can be useful. They can save time sometimes, but again, it can become a source of. wasted time because if it's not their domain, they will send you news that or not news but information that you You really find irrelevant. So you will read it once you will read it twice You will read it third time you will say no, really I just I'm wasting my time and that's it So this is how Google Home for example, you know, I I've bought it Already probably seven or eight years ago it was advertised like okay, you can talk to to your speaker and it will answer your questions and do stuff. But the only thing that I currently use it for is to ask what is the weather today. And I also put stuff on my shopping list. For everything else, I think it's, I waste more time than I benefit. Diego Calligaro (1:00:24) OK, thanks for the clarification here. And looking as well always about the future, there's a lot of talk about AGI. are many as well statements from CEO of large corporation. Maybe some time that are driven from interest in terms of valuations as well. But also there are some individuals, for example, Ray Kurzweil when they speak about singularity or the big transformation about AI can bring. Let's imagine we are 10 years ahead, it's 2035. How do you see the world and what AI is bringing to the world? Andriy Burkov (1:01:06) Well, it's a tough question because we know that LLMs will not lead us to this AGI for the reasons that we discussed. So we cannot have the data that covers all possible questions. So it's just impossible. now, whether during this, the next decade, someone will come with a better idea of how to create a more capable or more universal AI? Maybe, maybe not. It's just like in my book, I show a graph where people predict, mean, mean, scientists were asked to predict when they think a human level AI will be invented. And starting from 1950s, consistently scientists predicted that it's about in 25 years. So like it's been like this since 1950, we are in 2025. So for 75 years, it was always predicted in 25 years. So I don't know, it should be 25 years in 2025 too. So it's all about the idea. For example, if you asked me three years ago, whether a language model as capable as ChatGPT could exist. I might say, well, probably, but very unlikely because at this point in let's say 2020, 2022, we believe that one shot is doable. It's like you give a text in French, it gives you back a text in English. That's it. Or you ask it, I don't know, what is a... What is a plane? And it will say a plane, it's a flying object that the mean of transportation. But then you will say, you will try to ask a follow up question like, why is this, is that? Why did you answer it this way? And the model couldn't do it because like one shot was believed like, okay, it's very similar to classification. You show it something and you give something as an output, but. a conversation is much harder for a model, so the model will forget very fast what was discussed previously. And then, wow, for some, like no one expected, and then you wake up one day and you see this chart GPT that can maintain a coherent conversation for multiple turns and remember what was said five turns before. And it's all like it's all it changes everything for you. So your, your, your perception of what, what's possible is now totally different. So I personally, in the beginning, because I didn't know much because information was scarce about how they build charge GPT. So I was under an impression that it's much bigger than it was. And then once the information about how it was trained. like what is the math behind, how the data was used. Then you see, okay, so it's just supervised learning, not different from cat versus dog, it's just on a bigger scale. So when you realize this, you say, well, no, it will not become as intelligent as us because... It lacks so many pieces that make humans agentic and LLMs not. So what it will be in 10 years if such idea again gets proposed and it succeeds, can be anything. But it could be nothing as well because we cannot really predict the appearance of ideas like chat GPT. Because we couldn't predict it three years ago. How can predict it today? We can. So you can have multiple hypotheses if this, then that. And those CEOs in their interviews, they don't shy saying that in two years, in five years, it will become so intelligent that people will be out of jobs. I doubt it because they don't have any data. or science to confirm these claims. So they probably do it because you cannot build something big if you don't believe that you can do it. If you hire an architect to build a skyscraper and they think that they don't believe that they can go higher than two floors, then they will not build it. So if you are CEO of a company that should be in some bigger place in five years, you need to kind of believe that it's possible. And you also need to convince people around you that it's possible. This is why they give all these interviews and make those bold predictions, because otherwise they wouldn't be where they are. So this is, know, like every person at every moment in time is where they are and they act... according to their current situation. Then they retire and they will talk about different stuff which often happens. So we will see. Diego Calligaro (1:06:03) And thanks a lot for all this information and last closing questions. So you're a top leader, you you have an impressive career and looking back at your career and life, what would you have done differently? Andriy Burkov (1:06:18) many things. But I think that if I knew that I can communicate complex concepts to people in a way that I can and people will love it, I would probably, first of all, write my book, my first book much earlier. And I would probably write more books than I did. The reason why, because I wrote my first book and then just less than two years ago I wrote the second one. And the second one didn't go as well as the first one. And I was like, okay, maybe it was just, you know, one-off success. So, okay, well, at least I had one book that people loved. But then I wrote this book on language models recently and again it was super well received. So I think if I knew that it could play out this way, I wouldn't stop after the second book. I would write the third one and the fourth one and maybe I would become a professional writer several years ago, who knows. So maybe this... And also when we moved from our country to Canada, I chose Quebec City because I thought that, it's a small town and we are from a small town, so it should be better. But I think for the professional perspectives, going to a bigger city would be probably a better idea. I wouldn't go probably to New York or Chicago or Toronto, but I think Montreal, it's a good mix between being a small but not as small as Quebec City, and again, my life could be different. something like this, but there is no significant regrets in my life, but just that some decisions... could be slightly different and life probably would be different too. But who knows, maybe it would be worse than it is. So I don't complain. Diego Calligaro (1:08:20) looking also at the accelerated pace of change in which technology is entering more and more in the life of people. What scares you most and what excites you most as well about the digital future? Andriy Burkov (1:08:34) I think when we think about future, people are afraid of I always find it fascinating that the Queen of England, who died, I think, last year or two years ago, she was born before the World War 1 okay, so before the first world war. And she was born like when there were no cars and no planes, that there was only, you know, horses and bicycles. And then the first world war and then they, you know, they invent cars and buses and planes. And then the second world war and the industrial revolution. And they invented all those, you know, huge mechanical tools that made the pace of progress so faster than it was before. And then computers, then Wi-Fi, and cell phones, and TV. There was no television when she was born, only radio. So if you take her life... Every 10, 15 years, it's a total revolution, the world becomes upside down. It might be scary. When you're a Queen of England, maybe it's less scary, but for a normal human, so much changes in their lifetime, it's a lot. And we are living maybe something similar, because if I remember myself when I was younger, no two devices could be connected to one another. You like you buy this, you buy that and they cannot interact. And now my kids sometimes tell, dad, why my phone doesn't connect to my TV? And I say, well, sorry, but your TV is from 2007. So they didn't have wifi. that sucks. So, you know, like a lot of things changed. Internet, we are overwhelmed with information. No generation ever. had so much information that we have and now it's not just information, it's manipulation, on a massive scale across social networks and we consume all of it and the more you consume, the more anxious you become because first of all, you cannot digest all of it, second, it's just too, you know, it's... The information is so polarized, so they always want us to believe that something is more scary than it actually is, because if you are scared, you keep consuming this content and they sell advertisements and so on. So we live in a very, very difficult time for our brains to catch up. So yes, I think our entire generation, we are afraid of the future, but again... If you go back to 19,000, it was as overwhelming. Imagine no cars and then there are cars. Imagine no TVs and now there are TVs. So every generation has this fear of what will be next. Maybe at some point we will invent everything inventable and then we will live for centuries without anything new. I doubt that it will ever happen because people have this curiosity and always want to break the rules and invent something that wasn't invented yet. it's just part of being human, I think. Diego Calligaro (1:11:48) Amazing. Andrey thanks a lot for all these insights and great conversation. And definitely going to speak again in the future. Andriy Burkov (1:11:58) It was pleasure, thanks Diego for having me. ICEO Technologies (1:12:02) Thank for watching this incredible interview, leave us your comment with what you think about this episode, what you would like to discuss in the future or any guest suggestion and remember you can follow us on social media, you will find the description the links. Thank you again for your time and I see you to the next episode.
About the Guest
%20(1).png)
Andriy Burkov
Andriy Burkov is one of the most influential AI experts in the world. He is the author of several AI best-selling books like The Hundred-Page Machine Learning Book.
Andriy has a PhD in AI, worked in leading AI roles at multinationals like Gartner, and has over 1 million AI enthusiasts who follow his insights across his popular LinkedIn newsletter and social media platforms.