Skip to main content
Image
purple hex pattern

Transcript

Ken Kadet: Has AI finally arrived in software development? That's what some industry watchers are saying, and not all of them are happy about it. An OpenAI, ChatGPT, has been a water cooler topic for weeks now. ChatGPT is, in quotes, "A powerful language model developed by OpenAI that is capable of generating human-like text in real time," and I know this because ChatGPT told me so. It could potentially do your homework for you, improve customer service or even write your code, which could have a host of implications for cybersecurity and the way enterprises do business today. I'm Ken Kadet, and this is the Entrust Cybersecurity Institute Podcast, and today's focus is the cybersecurity implications of AI-driven software development. With me today is our brain trust from Entrust, Anudeep Parhar, COO for digital here. And three members of our software leadership team, Greg Wetmore, Tushar Tambay, and Ghufran Mahboob. Welcome, everyone, hello. Let's start with Greg. As one of our software development experts, what's going on with AI and enterprise software development?

Greg Wetmore: Well, AI and software development has rocketed right to the top of the hot topic list that's being discussed, both in the open-source world and at commercial software companies. That activity really was caused by the launch, by Microsoft of a feature called GitHub Copilot. Now, they launched in general availability in the middle of last year. But even just a couple of months ago, they announced broad support for Copilot in the most popular IDEs, and they announced a licensing model or a licensing program for commercial software companies. So, GitHub Copilot is what they call an AI-powered care program, and it can do some pretty amazing things. If you imagine a software developer working in their development environment trying to accomplish a particular task, Copilot can pop up and figure out what that developer is trying to do and suggests even a whole block of code that finishes what they're doing. It's sort of Autocomplete on steroids. Even more amazing than that though, you can ask Copilot, just using natural language. You can ask it, specify what you want it to build, a software module, even a whole program that acts on certain inputs, does some behavior and creates certain outputs. It really is pretty amazing. Copilot is powered by the GPT-3 AI model or AI engine that was produced by the OpenAI company. It's the very same AI model that's underneath ChatGPT that you introduced in the intro there, and ChatGPT is, of course, exposing the world to some of the amazing, incredible things, state-of-the-art AI can do. If we take the discussion back to software development, there are some definite benefits here. Copilot promises to make development more efficient, automate repetitive tasks. Maybe developers don't have to go in and read API documentation to learn the minutia of a given interface in order to use it. And ultimately, potentially reduce the level of expertise required to get a task done or allow tasks to get done faster, make companies more efficient in their engineering. The flip side of that though is we're really starting to talk now about some of the risks or some of the negatives that are attached to AI-driven software development, and two that stand out in the discussion so far are legal concerns, legal issues and security issues.

Ken Kadet: What are we seeing so far? I mean, the issues sound incredibly wide ranging at this point. What are some of the legal issues we're seeing already? I've seen word of some lawsuits already regarding GitHub and ChatGPT. Describe that a little bit. Are these copyright or are they security-related?

Greg Wetmore: Copilot was trained on essentially all of the public projects inside of GitHub and, for the most part, that's open-source software. And so, essentially, the training model has been built up with all of this public domain software. But that software is released under licenses and as copyright attribution and requirements to satisfy those licenses if you're going to use that open-source. And what some of the early adopters have found is that, when they create software using an AI engine like Copilot, blocks of code come back that match pretty closely open-source projects, for instance. In November of last year, a couple of law firm and a software author launched a class action lawsuit alleging copyright violation. So there's definitely some legal issues around attribution that are currently being discussed and probably need to get resolved before we see real broad adoption of AI-driven software development.

Ken Kadet: Let me ask some other members. All right. Let's talk to the members of our software team. Tushar, I mean, is this a cause for excitement or concern when you see all this, the introduction of AI into software?

Tushar Tambay: I think there's a little bit of both. I think, as most companies are doing, especially after they've heard this lawsuit became public, there is concern that, if your developers are just using this tool willy-nilly, they are possibly bringing in its secure code or code that you know you can get into legal trouble for using. I think there's a lack of appropriate guardrails around how this technology can be used and, those guardrails, are we still seeing the process play out? There's examples of the initial, "The technology is great." I was reading somewhere that the parallel with Napster is very interesting because the technology was there to bring all of this music and all these the media to a lot of the people who are really interested in consuming it. But the way in which it was brought about was essentially illegal. And then, if those models evolved where people figured out the right way to share attribute and reward content creation. And now, we have Spotify and iTunes or Apple Music and the equivalent, so something similar needs to happen. Although it's exciting that there's this ability to create this switch code so quickly, how we use that tool is something that we'll see play out over time and people will develop best practices around it.

Ken Kadet: What do you think, Ghufran? Do you agree with that? What's your perspective on it?

Ghufran Mahboob: Yeah. No. Absolutely. This technology is already out there and adoption of this technology really, is inevitable. It's not coming in the future, it's already started. And so, really, we need to embrace it, but we need to embrace it with care. We need to make sure that we are being careful about the security and legal aspects of it, and those guardrails, as Tushar mentioned, they will come, right? If we look at this technology fundamentally, I think the whole reason we are having a lot of this discussion is because it's for the first time an Artificial Intelligence which is learning from others and providing output. If you consider like if I go to college and I learn from professors, I learn from examples of code and so on, and then I use that knowledge to produce code, which may perhaps be similar to examples that I learned from, I don't think anyone would have an issue with that. That's how human beings progress, is by building on knowledge from previous generations, from previous work. I think a lot of the excitement is, for the first time, we have Artificial Intelligence that's actually doing this and, perhaps, for sure, some of the knowledge that it's learning from and regurgitating is not being attributed correctly. So that's definitely something that needs to be fixed, but it's exciting. This is going to be, perhaps, one of the biggest game changers not just in software, but overall for humanity.

Ken Kadet: Yeah, definitely, a lot of excitement around this. Anudeep, from your point of view, when you start to think and ask questions to people about what is this going to mean for enterprises, for example, and how to think about security for it, how do you start to think about cybersecurity implications of this kind of software development? What kinds of questions are you going to be asking?

Anudeep Parhar: I think that is an interesting question. Even piggybacking on what the guys were talking about, I'm not a legal expert, but you can certainly see that there are some legal ramifications. Either they will flow into the enterprise world as well. So, in my mind, I separate the conversation into a couple of different things. There's a technical conversation around some other quality or content that's being used and if that content, if there is issues from a technology perspective, etc. Let's come back to that in a second. But another way of looking at it is this looking at thee problem, so to speak, sideways. I don't think ChatGPT is going to be free forever. I think it's going to be sold to you through a tool. It's like Visual Studio produces a lot of template boiler plate code for you, and that's covered under that licensing mechanism. I think that's the natural end for this thing is that it will become the Uber assistant, Uber boilerplate code which can be done, which can be created. If you go far back enough, that's the entire evolution of the integrated development environment, it's that. I don't want to date myself. I still used to program 8086 chips which had the hardcore assembly. Now, when the IDEs were built and the multiple generations of languages, fourth generation languages, etc, you got to see, I think, this becomes the next generation of writing code. But from a legal perspective and enterprise perspective, I think it's going to be really hard to put controls around saying, "Are you plagiarizing the code or not?" It's very hard to do that. I'm sure somebody will try to put some technology to find out if it's been plagiarized or not or it is being unattributed. What I think where it's going to be is, for 20 bucks a month, you can buy a license, and all of our developers can produce as much codes they like, and probably they can subscribe to ChatGPT. If you look at the even OpenAI and ChatGPT structure right now, you can subscribe to domains. You can pick which domain you want to use to create content. On one hand, I'm really at awe with what ChatGPT and OpenAI guys are thinking about. They're putting domains in place where you say, "Use this domain to create content," and like Greg was saying, the training of the site domain, etc. Anyway, practically speaking, I think the legal issues are going to go away because I don't think we are going to use the open-source version, so to speak, the free version of it. This is your typical freemium model in action. We're going to get addicted to it, and then 20 bucks a month is going to be nothing.

Greg Wetmore: You're absolutely right, Anudeep, but they're already there. The first one is free model. When it launched, you could freely use it, try it out. Now, with the enterprise licensing, it's 10 bucks a month for software companies, licensed for every one of your developers. They're right there as absolutely, the thought process here is how to generate revenue off of the AI capabilities, so you're right.

Anudeep Parhar: The other piece is I think and this kind of gets into and the interest of opening the conversation to new possibilities so to speak. Either there were 15 years ago, the Quintessential lawsuit who owns your tweet, once you one-year tweet, who's the owner of that tweet because I wrote it, is it mine? Is it Twitter's? Is it the public domain's? Is it the group, are that sponsor and etc? So I think there are going to be some issues there. We had the same issue when the cloud started. Either you cannot sell in a multi-tenant cloud-based system. You cannot sell usage of an individual user, but collectively you can monetize the collective analytics and trends. So I think there is going to be some really, really interesting monetization models on this as well. But I think it starts as an assistant and then there's paid models and then there is going to be some of these other monetization that will come out of it. I absolutely see that's the logical conclusion to this stuff. Just building some code or there was an article today that say it's like, "Hey, if I'm only using ChatGBT after 10:00 PM at night because it's sort of a plan, it's a replacement or equivalent of a video. But if I really need to bring it to the enterprise, it's 10 bucks a month. And I think then you get a high quantity better attributed code and you can have ownership of the code as well. So I think that's how I see the point.

Tushar Tambay: In terms of the exciting usage of it. Obviously, it's been mentioned that as it can produce code that turns out to be an attributed and it's a problem, I believe it's also going to be pretty useful for spotting code that is plagiarized. And it could be a sort of pretty much both for spotting code that is plagiarized as well as checking code that has been developed outside of it and providing feedback on it and kind of things that IDs already do, but at orders of magnitude better level. The other thing that I wonder about is enlarged enterprises. We've been talking about people they use ChatGBT, they could be using code developed somewhere over in the world. But there is a real issue of within the same company, there isn't as much code reuse as you'd want to be. And I wonder if somehow training this tune on a more local data set will help us produce more reusable code within groups, within enterprises, which is goodness. And of course, the company holds that code. The problem typically is that people even within groups will sort of write the same thing over and over in slightly different ways and more unmaintainable ways. So it could help us there, just several possibly very exciting possibilities.

Ken Kadet: Yeah, definitely. So I guess, for me as the non-software developer in here, I'm just wondering about the scary parts of this. So are we worried about AI developing software, if you want to put it that way? Are we worried about different kinds of vulnerabilities or new kinds of vulnerabilities being created, and what is the concern about how bad actors are going to find their way into these things and to find ways to influence the code? I mean is there a possibility of AI bringing back doors into the fray that people can find? Where do you think things are headed?

Greg Wetmore: Yeah, that's absolutely part of the discussion and part of the risks and downside that we're thinking about right now that there's a saying, 'Garbage in, garbage out.' And all of these AI models are trained on data, massive amounts of data and of course, that data potentially has defects in it, even security defects in it. There's actually a study a couple years old now, 2021 and so, in the world of AI, that's a little old. But it found that 40% of the programs that came out of the GPT engine had design flaws or defects that could be exploited by an attacker. And so, it really puts more onus, I think on developers and development teams to be able to oversee and think hard about the output of these AI engines and think about the secure coding practices and follow their secure SDLC to make sure what they're producing is in fact secure code. The other piece of that, this part of the discussion is that which you sort of hinted at the end of your question, who can influence the data that's being used or the AI model that's underneath these tools. And so, there's some discussion now about things like data poisoning attacks. I think about a little bit like fake news where you have these bots out there on social media platforms just spamming out all of this fake news stuff and it becomes part of the discussion in society all of a sudden. That same concept, if we're training these AIs on all these open-source projects, you could have thousands or hundreds of submissions of potentially malicious code on purpose to influence the AI models. So there's some real interesting discussion around security happening attached to this AI powered software development.

Ken Kadet: I think one of the key things here is we've now got this wonderful tool that can spew out a lot of code in short order and there are genuine security concerns around it. But on the security side of things, we don't have the same kind of tool sets available that can keep up. So organizations have always had ways to vet and validate and prove their code before it gets put out. But those were based off of the previous speed at which we could develop code. And now, it seems like with this new technology, we can probably deliver and develop very fast. But I don't think we have right now the tools available to allow us to vet and validate the output of these engines as fast as we previously could. So I think that's a key thing that will need to be addressed going forward.

Anudeep Parhar: I think these are the issues that I think are going to get sorted out. In my mind, Greg, what you brought up, the poisoning of the data, I think that's pretty massive. Given even the last few years, we've seen state-based actors etc, and the real illusions of ransomware, all of these could be pretty significant and the dependence on the set AI increases, it is only as good as the model that's powering it. So well, what used to be when the decision control systems came out global 5, 10 years ago, the issue was with the bias in the data. And now, here, the upside for the bad actor is huge if they can so to speak, put a bias in the composed of the data that's powering the AI. And because of the ease of use, I think the controls upfront are going to be very minimal. Which is people are going to be just using it as it comes out without having appropriate controls. So that's one of the biggest things that I see from just poisoning of the data. The other piece is this is more of a philosophical conversation. The part of open AI that's interesting is of course, the AI, but the open part. Which means is, it's only as good as it's open. If you say we are going to have an enterprise version of OpenAI and it's only based on the corporates that for example our company produces, it sort of limits the creativity or the value it provides, but it can provide a lower risk model. But if you want the benefit and you open it up, then all the risks comes up. So I think the cybersecurity models are going to have to change to address some of this. It's kind of open-source on steroids, you have to figure out how to do this. I think the really interesting thing is for technologists like ourselves and especially people who are in the cybersecurity business, I think it's a brilliant opportunity to come up with not just organizational solutions but also technical solutions, policy solutions. And so, how do you manage some of this stuff? And I think there is a really interesting space and I'm been really interested in you guys' point of view of what kind of tech should be built to help avoid some of this stuff going forward.

Tushar Tambay: So as we were talking about first thing, ChatGPT. What came to my mind was in some ways, we've had a new profession over the last new specialty emerge over the last 10 plus years, which is about poisoning Google, it's called search engine optimization. That's what you're doing, you're basically training and gaming that algorithm to promote your content to the top. And that's valuable to companies. People pay a lot of money for that. I'm not able to see what is other than for the bad actors, what equivalent this world would be. But there seems like very quickly an emerging need to use ChatGPT in a certain way or train it in a certain set of data that it gets pointed towards certain results. Who the bad actors and whether there's could be the commercially viable, legal, profitable way of doing something you couldn't to search engine optimization is not clear, but sits sort Of on the periphery somewhere.

Anudeep Parhar: I think that's a really interesting point, Tushar. There's been like when the original crowdsource Wikipedia type stuff came, the controls that were put around were more crowdsourcing and figuring out how do you put controls and then you have the new SEO example, the entire ecosystem world, which sort of balances creation of new things like, how do you get more advertising or more clicks using SEO plus the corresponding controls. I thought that was going to be exponential. And then from a security perspective, one of the interesting set of question comes to mind is that a while back, there was a study and Greg, you'll enjoy this, when quantum and post-quantum was coming up, it was really interesting. There was either MIT, I think did an experiment where they were trying to get one AI to hack another AI from an observability point. They wanted to just figure out if you could actually, I think there is a whole market to be able to say how do you build AI models which can create controls to secure your ecosystem so to speak. So I think there's going to be some really interesting things that will come up with this. Again, what else from a [inaudible] perspective, I think we should probably talk about some of that stuff as well is how new organizations think about this stuff of what kind of an option we should have and should people jump all in into those. How do we go about that? I don't have a particular view at it right now. I think about it as more, hey, it's really interesting. I think the upside is huge, but I'd be interested in Greg, Ghufran, Tushar, your point of view saying, how should organizations go thinking much, should we just try what should be the adoption model, so to speak?

Greg Wetmore: Yeah, I agree with Ghufran here, to think about this as an inevitable technology that's going to provide a significant, massive even benefit over time to technology companies. I also agree with Tushar's thinking around guardrails. We're definitely early with this technology and probably a little bit too early to sort of blindly adopt this into commercial software development. But I believe the development will happen quickly around the guardrails and around the mechanisms that allow commercial software companies, even cybersecurity software companies like us to be able to leverage AI in our development process.

Ken Kadet: I think that makes a lot of sense and it sounds like we're in this sort of wild west time and we're moving toward adding the necessary guardrails and friction that will make this more secure and more manageable and in the end more productive and faster for everyone. So let's end the conversation here and hopefully we'll come back to this one soon. Let me just throw it out to everybody. We'll kind of go around the horn a little bit. Just looking broadly at AI technology and where it's headed, thinking about what leaders and IT security, but in general business leaders should be thinking about in terms of AI technology. What are some questions you think people should be asking, or what is your prediction on what's going to happen next? Maybe Ghufran, let's start with you.

Ghufran Mahboob: Sure, absolutely. The one thing that struck me with this technology is we are still a short while away from wide widespread adoption and so on, but it is coming, which means we have very little time to prepare. So obviously, we need to be working on the tools to make sure that we can vet security and so on. But the other big challenge more generically is, okay, so it looks like we now will very soon have tools that can do most of your day-to-day coding for you. And so, what that means is how do you up-skill your workforce? Because most of what people are doing today, maybe 60% of it, 80% of it, I think that that's not clear how much yet can probably just be automated and generate and automatically. And so, only the tough problems, the very difficult things will need human intervention. And so, well, how do we make sure that we have that workforce ready to attack those more difficult problems?

Ken Kadet: Yeah, great point. Tushar, how about you?

Tushar Tambay: Yeah, I mean, it's a good point that Ghufran made, Ken. I think I find myself sometimes looking at how schools are reacting. A lot of the software development that's being taught over there and how they are reacting to the presence of ChatGPT. And I found this interesting set of instructions to the class that a professor at Wharton gave. Now this is more about language and so on. It's not exactly coding, but the instructions are good. And to his students, the memo says, "Be aware of the limits of ChatGPT. If you provide minimum effort prompts, you'll get low quality results. You need to refine the prompts in order to get better and better outcomes. This will take work." Then it says, "Don't trust everything it says." So there's things like that which we need to teach computer science students as well as developers about how to effectively use it. We talked about the guardrails, but there's also the scale about how to better use it to get best possible outcomes from it. And I think those things should flow back into the way we are thinking about using this.

Ken Kadet: Yeah, that's a really good point, Greg. Let's go over to you.

Greg Wetmore: Yeah, I guess, I follow or take my suggestion from the hyperscalers. Microsoft is investing with a billion dollars in this. Google very recently came out and they called it a code red for their company, how important AI is to them. This is a transformational technology. It's probably going to touch businesses in all kinds of different ways, certainly cybersecurity and certainly software development.

Ken Kadet: Anudeep, we'll give you the last word.

Anudeep Parhar: I agree with the guys. I think it's hard to add to so thought deep into it, so let me try to come at it a little bit different way. So one thing I look at it as that if we lens back out as professionals far back enough, one of the things is I think we are already not to overuse the Gartner terms at the peak of inflated expectations. I think the journey from there to the plateau of productivity is going to have to go through cybersecurity. So it has to go through securing and making sure it's usable. So that's one point that I look at is, there is a lot of innovation outside of what's already happen that needs to happen to really leverage this. So I think there's an opportunity there that I see. And the second thing is more sort of philosophically speaking, I think one of the things that I look at ChatGPT or open AI, it's interesting this, for the first time I see a lot more people were surprised, the people I speak with less so with the content generation, meaning even code generation people were okay with it. I'm old enough and we used to build via other compilers and all the other stuff. So I think these things can be done. But what really flabbergasted a lot of people, was it automating and artificially creating creative stuff, actually getting into sort of sort E-Rate, almost like caramelizing, the creative and the knowledge workers. I think the social impact and the organizational and the enterprise impact is going to be pretty amazing because as engineers and as technology professionals, I don't think we are used to being sat at the receiving end of automation. We are usually at the gill end of the automation, so to speak. We automate other tasks, but for the first time you see, oh, the tasks that we perform are being automated. But I'm still bullish, I think there's a lot of innovation left. I simply subscribe to the technology and all the technology that's been built out of the age old adage, this is the extension of hand philosophy. My hand is still needed, but I put that baseball glove on a baseball bat. I'm that much better at this sport, man. So I think this is going to really, really help us. That's how I think about it.

Ken Kadet: Yeah. I think that's a good place to leave it. And I definitely think this is a topic we'll be coming back to over the coming, if not coming year, or even in the coming months. So lots to watch in this space. So thanks for listening this month and see the show page for notes and links to our content. Our podcast today was produced by Stephen Damone. If you have comments, questions, or ideas for our podcast, do write us at cybersecur[email protected]. And thank you very much for listening.