In this episode of Smart Route, we’re going beyond basic call transcriptions to see how you can start uncovering the intent behind your customer’s words and what informs their buying decisions using conversation intelligence.
We sat down with Deepgram CEO Scott Stephenson to break down the concept of what is “conversation intelligence”—how it works, what kind of value it can add to your organization, and how businesses everywhere can start leveraging this technology to pull insights from their daily conversations with customers and understand, more intimately, what drives them in order to refine their marketing, sales, and service strategy.
- 2021 State of Automatic Speech Recognition (ASR) Report
- The Power of Conversations in Omnichannel (On-Demand) | Deepgram + CallTrackingMetrics
- Deepgram & CallTrackingMetrics case study
- Deepgram website
(Courtney) Hello again and welcome to Smart Route. I’m Courtney Tyson, your host and strategic partnership manager at CallTrackingMetrics. Thanks for listening in. Our guest today is Scott Stevenson, co-founder and CEO of Deepgram. Scott is a dark matter physicist turned deep learning entrepreneur. He earned his PhD in particle physics from the University of Michigan, where his research involved building a lab two miles underground to detect dark matter. He left his physics research position to co found Deepgram, where he serves as CEO today. Hi, Scott, and thank you for joining us. Thank you for having me. Great to be here. We’re excited to chat with you. For those that don’t know, Deepgram is the leader in enterprise automatic speech recognition that’s ASR for short. They work with call centers and software providers, including CallTrackingMetrics. So the goal of today’s call is to generally talk about what conversation intelligence is, we’re going to talk to Scott more about how Deepgram has utilized it, and what advice he has for businesses to get started.
Scott, in the simplest of terms, can you describe conversation intelligence for us? And how a business might leverage it?
(Scott) So, there’s a technical definition for it, but I’ll save that and give you the general idea. The best way to think about it is what does a human do when they’re in a conversation? And part of this might be the transcription of the words. In other words, hearing a word and knowing what the word is. There might be a timing component to it, when were these words said. Underlying it all is trying to discover what the intent is behind the conversation. At Deepgram, we build speech recognition systems, and that means there’s a machine listening to a conversation generally between a couple of humans, and it’s trying to do the job of what a third party would be, if they were listening into the conversation, maybe you actually have somebody taking notes or that type of thing. So it might be who is speaking, when, what words were they saying? What was the intent behind the conversation? And what language are they speaking? Are they excited or not? That type of thing. And this is really valuable information that essentially, for all of the history of humanity has never been automated. Previously, it had to just be a human that listened to the conversation. And maybe they would score it or flag it as negative or positive or something like that. But it was really expensive to go down that route. And so many companies never did. Or maybe they did it on a small percentage of their audio. But now with AI and automation actually becoming efficient. And with high efficacy. You’re actually going to see over the next 10-20 years that we’re not just going to be talking to an IVR when you call in and they say, do you want to talk to sales or support? And it’s a one word answer. You’ll be saying an entire sentence, and you’ll actually be happy about it. Right now, maybe you call in and it doesn’t understand you all that well. They’re probably still using an antiquated system when that happens. But over the next few years, you’re going to start to encounter lots of “wow” moments when you’re interacting with an automated system. So yeah, that’s what conversational intelligence is all about. It’s taking natural voice or text that people are speaking or typing, and trying to understand the intent behind it. But it has a lot of pieces in it, like which language, what words, that type of thing.
I think that’s really helpful for our clients at CTM and our listeners in general, because I think a lot of times, we’re being asked about conversational intelligence, or maybe for it, but our clients or our listeners, they don’t really know that they’re asking for it, because they don’t really understand what it is at its core, because when you think about speech, automation, or speech recognition, really all that has been out there in the past is like the general transcriptions, right? So that’s really helpful. Now, I’m sure you’ve drank your own champagne.
Looking at your own business, what are some of the best insights you’ve been able to glean from using conversational intelligence?
Well, one of the first things we built at Deepgram, when we were a really young company, we didn’t have anybody answering the phones. And so we used a VoIP company to intercept phone calls that would come to our Deepgram phone number and transcribe them and determine whether they were important or not. Is it an employee verification coming in, that’s probably pretty important. But if it’s not an important call, then we don’t need to go listen to the voicemail or call anybody back. That type of thing. So from the very, very early days of Deepgram, we were utilizing this for our own business. Now, we don’t get a ton of inflow into that contact point. But there are many customers that work with Deepgram that do that, or they have a 10,000 seat call center, or even 100 or 10 person seat and they’re wondering what is happening all day? Are we missing out on opportunities? Are competitors being mentioned? Is there anybody that is way too excited in a negative way? Or maybe you still would like to know if they’re very excited in a positive way as well. And so, there’s a lot of use cases: in meetings, in phone calls, it could be in just a one way short form, like the voicemail, or it could be conversational between multiple people over half an hour or two hours. So, a lot of uses.
You talked a little bit about how you use it as a startup versus a large call center. The approach obviously differs between small businesses and enterprise.
How would you scale gathering voice insights if you’re a small business versus a large enterprise?
(Scott) You always start the same way: What’s your biggest problem? A lot of times, that biggest problem is not necessarily a super complicated one, either. So if you’re, for instance, a call center that has a bunch of people calling in to reset their passwords at the end of the month, because that’s when your password resetting tends to happen, it’s fairly predictable. It’s a fairly easy conversation, they just need help doing it, etc. Maybe you could automate that. Or maybe somebody just calling in for the hours or something like that, you’re going to start to see these sort of 80% use cases crop up all over the place where you use automation to take care of the one or two or top three easy things that people are typically making calls about. And then the ones that take a little more creativity, a little deeper thought, etc, that’s when you have your human agents come into the conversation.
(Courtney) That’s helpful, because we work with a lot of clients that are kind of across the board in terms of what size company they are. Same goes for our listeners. Our clients also use our software differently. And we have clients using us for call center tools other for others for marketing, attribution, some even useful. I’ve worked in a client facing role at CTM for over two years. And I can tell you, no matter how a client is using our software, they’re bringing up conversation intelligence, it’s just something they’re asking us about often, the concepts just becoming more mainstream. It comes up because, I guess, people don’t really know how to use it yet, or where to get started.
How can a business get started with conversation intelligence?
(Scott) Yeah, so there’s an interesting story there. And I’ll get to an exact answer on the last question you had there. But I think people are becoming a lot more aware of the ability to use voice conversational intelligence and automation, because they’re seeing it more in the consumer world. Alexa is big, Google Home is big. Apple has a home device as well. And not only are they big and they have some hype behind them, they’re actually starting to work pretty well. In your consumer life, you can set a timer as you’re cooking and it actually just works and that type of thing. And maybe you catch your child cheating on their math homework by asking Alexa for the answer, you know.
(Scott) 5-10 years ago, Voice automation was a joke but now we’re like, there’s something to this, it’s actually working. And that definitely has raised awareness and popularity in the world, that that consumer awareness. And now people are asking the question, okay, obviously, there’s something working here, how do I get it into my business? How do I, how do I, how do I utilize this in a high value situation, and maybe the first thing they would think is like, hey, it’s working in the consumer world, you know, Apple, Google Amazon, they probably have the best systems, maybe I should just go with that. And it’s a good place to start, it’s a good first like, demo or something like that to get in place. But really, those systems are built for the consumer world, they’re built for that command and control type thing. They’re built to have a proprietary device that has multiple microphones and that type of thing. They’re not really built for multi party, conversational, like, 10 minute conversation, 30 minute, two hour long type stuff. And, and they’re also not priced for it either. So it gets really expensive. So one of the things you have to look at is like, okay, I or I would, I would suggest is like, what if it works, right? What if it does actually do the thing that you’d like it to? Do? You know, what’s the budget for that type of thing? And what kind of personnel Could you put on solving the problem? So what I mean is, is it going to take one engineer or like a quarter of an engineer’s time and everything is working? Or is it going to take like a dedicated team is, do you feel like, it’s going to take a year or two for you to switch out your underlying infrastructure in order to be able to absorb voice data, because a lot of customers, they, their, their voice system, they may not even have a voice system, they might have like, literally a dx system still in their basement, and they’re working on switching to voice actually COVID and a pandemic has sort of helped that out, you know, everybody’s had to step on the gas, on adopting voice, which is now you know, plus 20 year old technology, but, but nevertheless, yeah, and so that, if you start thinking of it from that standpoint, is voice data valuable to you? If so, if you could pay 100 times less than a human to analyze that data, like, what kind of skill could you get out of it, these are, pretty big questions and push companies down the path of thinking more like a data driven decision making company, rather than sort of seat of the pants, you say, like, Okay, I need to start getting raw data in so that I can turn that raw data into structured data. And then I can use that structured data in order to make decisions about my company.
(Scott) So specifically about voice how you do this, though, is think about that number one problem. I like to frame it as, “if I could just _____.” Whatever that is at the end, and just do this, then we would be able to build this other product and get this other category, and whatever it is. Come up with a few of those, and then pick the easiest one that has still really good ROI. It doesn’t have to be the best ROI, but really good ROI. And then make that example a win over the next six months or a year. And then now you’ll get your feet wet in this conversational world and know like, hey, this type of thing is probably gonna pay off in this other thing. Hey, it might be too complicated and might not be worth the effort for now. Maybe we’ll tackle that in two years, or three years or something like that. And yeah, so it’s usually, simple things that start with just if we could just then we would be able to, you know, and those sentences together for your business, then you just have to think, how do I get the best in class vendor to capture the data to transform it into structured data after it’s been captured. Because it’s in raw form when it’s initially captured, then you have to try and do some transducing. So basically, a raw waveform, and then you turn it into words, timings, confidences, what language was spoken, you know, that type of thing, put it into your data warehouse, or just make it available to your data scientists and product people. And then after that part, you visualize it or build a product with it.
(Scott) Make sure you have your leadership conviction behind doing it because there is no one weird trick that you can do in conversational intelligence right now, that just solves all of your problems. Don’t get me wrong, you can solve a lot of your problems with it, but it does take some thought about what to attack first and what budget to put behind it and which people to put behind it.
So what I’m hearing is that it’s important that before getting started with implementing any sort of a solution businesses really need to first establish needs around conversation intelligence, meaning they should really have some sort of an idea as to what data insights they’re hoping to gain. They’re gonna be able to pull out that raw data, but what is it within that data that they’re hoping to see or hoping to gather more about.
Yeah, it could be as simple as, the marketing team would love to know when competitors are mentioned in our customer support calls. That’s keyword searching across a large amount of audio. Now, don’t go too crazy with it. Just keep that as the scope. And then build it, put it in place, and then see if that data that you get back from doing that one thing is very valuable. If it is, now you have a win, and you probably did it in a month or so. And it gives you more confidence to step on the gas and bigger projects.
And that’s really helpful for a lot of our existing customers who are looking to leverage the tools that we partner with you for–the transcriptions—which allows us to then keyword spot. So that kind of gives them a really good framework to work off of. So thanks for that.
As we had said earlier, the concept of conversation intelligence is becoming more mainstream. And as the idea grows, more businesses grasp hold of it and want to run with it.
What do you think the future holds for conversation intelligence and speech recognition software?
I don’t think it’s an over exaggeration to say that automation is going to bring along with it a change to the world that is very similar to electricity, around 1900. Or that was very similar to the internet around 1990s. Until present a present times, I think the next you know, 20 years are going to be the age of intelligence age of automation. I think of it as the intelligence revolution. And we are just at the very beginning of it, and we are, we are going to now be able to make machines that right now computers, although they may seem really smart on the outside, they’re actually really dumb. They really smart people write very specific code for them to work the way that they do.
Artificial Intelligence allows you to in air quotes here write code, that doesn’t have to be that smart. Another way to say is that it’s focused on the outcome rather than exactly how it’s written. And so that allows you to build things that aren’t were impossible before. So speech recognition is one of those image recognition is another text translation. and that type of thing is another where, once you relax the rules of having to have it be written by a human line by line precisely, exactly. Instead, you can build a system that learns by example, that’s how all of these areas have taken off speech, translation images, etc. And we’re at the very beginning of that still in the so the future and for us, as you know, think about AI, you know, look to Hollywood, and a lot of ways or, you know, look to black mirror or look to whatever, you could take a negative side, or you can take a positive angle on it, I think net overall, AI is going to bring a huge productivity increase to the world. And what we’re going to see is that
the when, right now, many of the things that you have to have a human present in order to do you’re not going to have to have that anymore. And I give it a lot like when you have one of your smart best friends around and do you search on Google for your answer? Do you pick up your phone and type into it? Or do you turn to them and say like, hey, do you know what this is or what that is, and like having fun with them over the next 10 years, that’s going to become real, but with a with a machine now, and they don’t sleep, they don’t get cranky, they don’t have to eat, they don’t whine, they don’t do any of those things, you know, they’re going to be available whenever you need them. And that’s going to affect businesses in a big way. And that’s going to affect also just consumers in their personal life in a big way as well. Just freeing up time and bringing, bringing more productivity to everybody.
Sure, yeah, I look forward to the day where you know, I can call Verizon or you know, my electricity company and just make that phone call a five minute interaction as opposed to 20 minutes. As I’m trying to explain what I need to a robot.
(Scott) Right now, it might be kind of rocky, whether it’s a human or it’s a machine, but in the next 10 years, you’ll call and be happy about the outcome. I think every once in a while it’s kind of funny. Now, maybe I interact with a bank or with whatever’s happening but every once in a while I’ll get a little like, wow, that was actually easy. And it’s less than five minutes, and I solved my problem. Now, that’s not the normal case, right? Normally, it’s 45 minutes, an hour and a half, etc, explaining yourself multiple times, etc. But these companies are getting better. Part of it is because of the automation under the hood, and it’s just going to keep getting better and better over time. And it’s going to allow them to hire a workforce also, that is more highly trained, that is more specific in their role. And so you’ll get routed to the right person, rather than a catch all type person. And so that’s just quick. So if you do have to interact with a person, it’ll be the right person rather than just, you know, essentially a random person that just is going to change customer experience. Right now everybody dreads having to call into to a company but over the next couple of years, it’ll still be rocky a little bit, but the ones that are most competitive and best at attacking this area are going to win on the customer experience side. And win overall as a company as well.
(Courtney) Yeah, I think it’s interesting, too, if we think pre COVID, everyone was kind of weary of the phone call, what was going to happen to it, being able to now text and chat and send email to companies to get an answer. But I think through COVID, what everyone realized was like, wow, phone calls are still King, everyone still wants to jump on a call to get something done. And if we can make those interactions more efficient, well, phone calls really aren’t going anywhere. So I think that’s good for both of us right?
(Scott) And we look back to like maybe something that you would have sent in a Slack message, you could then search for it and find it later. And now you’re starting to ask the question like, man, I have all these meetings, but am I able to search for it and find the previous interaction that I had, I think this is all becoming normalized as well, that it’s okay, at least in a business setting, to have meetings recorded and have a way to search them and go back in time, and maybe you weren’t able to go to that meeting. But now you can actually watch it and a lot of us watch video back greater than 1x speed. And so actually, you can attend the meeting faster than if you had been there in person. You can get good notes afterward, you didn’t even have to take it etc. The world has had to basically over the last year get used to this digital form of voice communication, through zoom and through everybody that has raised that awareness, I think that there’s something to voice. From a scientist perspective, from my perspective, I look at it like, yeah, just think if you have a novel idea, and you want to write a text or an email to somebody how long it takes you to write that email. It may be like, half an hour, an hour or something like that. But you could clear it up with a five minute conversation with a person because you can understand what they understand what they don’t understand, based on their reaction to what you say, you can say, “you get what I’m saying?”, etc, you don’t have to go down that path of trying to be so complete with what you’re speaking. And so anyway, there is something too, trying to explain a complicated idea to somebody that text doesn’t do it images and video, maybe a lot of times will do it, but they’re very expensive to produce, the best route is to just have a conversation, just have a quick five minute conversation. And that’s not going to go away.
(Courtney) I love the Zoom example, too, because there’s always so many stakeholders that need to be involved in a decision or conversation that you’re having with them. And so scheduling that call to have to align with everyone’s schedules can be difficult. If you can just have it recorded, and then analyze it and send someone who can’t be present an explanation of what was discussed, would make everyone’s lives and jobs easier, really, as well.
(Scott) Yeah, you’re seeing the rise of audio again, as well. And actually, it’s really funny to say that because audio was the first in a lot of ways. You think back 100 years ago or a little more, there was no TV, you know. I guess maybe telegraph was the first, the equivalent of texting somebody by Telegraph, but it wasn’t widespread, and then radio came out and the telephone came out, etc. And it was the simplest sophisticated way to communicate. And, we suffered kind of from an infrastructure perspective from the success of that early on. You have all your switchboards. You have all of your equipment, you scale up, you put it in your basement, you do all of that, and the internet still isn’t around. It’s like 1990, you know, it’s all telco, telco.
(Scott) So you made that huge investment and then the 1990s and 2000s hit and internet really booms and you’re still doing things the old way, and you got a lot of utility out of audio in the past, but now you’re starting to feel its age. And the second sort of revolution for audio is coming. And you can see it in hype around companies like Zoom and Clubhouse. The discussion format is coming back in podcasts, all of these things are on the rise right now and have shown no signs of stopping, because it is this very valuable form of communication.
(Courtney) I couldn’t agree more. Well, this has been a really great conversation. I definitely think that our listeners can now better understand what conversation intelligence is. And also envision it and how they can get started using it to gather data. As you said earlier, data decision making, I like that term that you coined there. Data is king, and we all need it to make the best decisions possible for our businesses.
So before we sign off, Scott, is there anything new or exciting you wanted to share with our listeners?
(Scott) Yeah, absolutely. We have a “How Deepgram Works” white paper that goes deeper into how automatic speech recognition works, and how you can set it up to be successful. Of course, you can work with platforms like CTM, and just have everything work. You just turn it on. But if you want to know how it works under the hood, that white paper can help you out. And also we have a few webinars that you could check out to help you understand that as well.
(Courtney) Thanks, Scott. We’ll make sure to include links to those resources as well as your Deepgram website in the show notes, so listeners can learn more. Thanks again for joining us today. Thank you on behalf of CallTrackingMetrics for being a great partner to us. We’ve been working together for a while now. And we really look to your team to help us best understand approaches to conversation intelligence to help our clients achieve success. So thank you very much, Scott. And thanks to our audience to for tuning in. Make sure to keep in touch with us and follow us on Twitter at @smartroutepod.