Can artificial intelligence save your dataIt is sixty years since scientists gathered for their historical meeting about artificial intelligence at Dartmouth College in New Hampshire. It was a month-long brainstorming session during which scientists they explored the possibilities of thinking computers, and it kick-started a field of research that we’re still benefiting from today. AI is revolutionizing areas ranging from transport to translation. Can it do the same for cybersecurity?

Much of the action in AI today revolves around machine learning. It’s a subset of artificial intelligence that enables computers to learn from lots of data without being explicitly programmed. The idea is that if you get computers to look at enough old data, they can start recognizing patterns in it that will help them to understand new information and do smart things with it. This includes making predictions about the future, and spotting patterns that people might miss.

Running to keep up

At the very least, machine learning will let computers do things more quickly than human beings can. We need this kind of thing badly in cybersecurity because cyber attacks are themselves becoming more automated, said Mike Stute, chief scientist at managed security and cloud-based communications firm Masergy. He is the chief architect of the company’s network behavioural analysis system.

“More data is coming faster and faster, so this concept of being able to have human intelligence drive this thing is already somewhat antiquated,” he said, adding that while we have no proof of artificial intelligence being used in attacks, some of them evolve so quickly that its involvement is self-evident.

Artificial intelligence comes in two broad forms: generalized and specialized. The first is the kind that science fiction writers have envisaged for years; the one that can hold an intelligent conversation with a human being, and exhibit signs of consciousness, feeling emotions and effectively mimicking the brain.

We are still a long way from that dream, and some recent attempts at it have failed dismally. Microsoft’s Tay was a machine learning-driven social media bot designed to learn from conversations with people online. She turned into a racist, drug toting psychopath after the Internet got its hands on her for a few hours.

Narrow task sets

While we stumble blindly towards a generalized AI, specialized artificial intelligence presents more immediate possibilities. It is an artificial data structure designed to be good at just one thing, such as speech or image recognition, or perhaps pattern matching.

It is this part of AI that carries some promise for cyber security applications, suggest advocates. Machine learning lies at the heart of that effort, because it involves ingesting and drawing inferences from data. Mounds and mounds of data.

“We finally have some of the compute power and storage and have built systems on top of systems that are complex enough that we can make some headway,” said Rob Clyde, International VP of information systems governance and security association ISACA.

AI can detect anomalous behaviour that varies from baselines, he explained, adding that it can also discover new insights about security. He recalled one such analysis that came out of a recent machine learning project. It revealed that if any endpoint had more than 25 different identities in use, it had almost certainly been hacked.

“Another company was able to detect that users were simultaneously logging into VPNs with the same ID,” he said. “Either they were spitting themselves into multiple people, or they were sharing those IDs and being hacked.”

Neither of these insights may seem particularly ground-breaking, but when you’re dealing with tens of thousands of users and endpoints, they can highlight emerging problems and help human managers to prioritize their efforts. Machine learning can also help large companies to reduce false positives inherent in masses of network information, which is something that the SecTor blog has covered before.

“I’m sorry, Dave. I can’t do that.”

Still, AI can make its own mistakes. Google’s gorilla gaffe from last year was one of the most notable. Can we trust it not to foul up our cybersecurity, too?

This is where we have to be careful about AI and machine learning. All too often these days, companies are inclined to label things as AI and sprinkle magic glitter on them, sweeping aside the challenges and marketing them as a cure-all. Sadly, that isn’t always the case, warned Stute.

“It doesn’t lend itself to cybersecurity as easily as it does to image recognition,” he said. “The space is a little confused. There’s a lot of talk about it. The model that people [sometimes] use is to ship it offsite, do their machine learning and ship a profile out to all their customers. The problem with that is that everyone gets the same profile, so anything that gets past one company gets past them all.”

Finding the experts to customize the data analysis to suit your particular situation can also be difficult, he warned. This tempts companies to simply farm out large data sets to a third party and have them apply more generic pattern matching, he suggested.

One of the biggest challenges for AI has historically been one of the biggest challenges for computing: garbage in, garbage out. Without enough good data, you’ll end up with a poorly-trained model, Stute said: “To train these things well, there’s a lot of data to run through a large network.”

Analyzing your data dynamically can be difficult to do in the cloud, because replicating all of your packets there would double the bandwidth you’d be using. The alternative is to send metadata, but then you may be missing things underlying that data, and if you send the wrong metadata your model won’t work at all, he warned.

Picking your battles

Does this make artificial intelligence unsuitable for cyber security? Not at all, said Clyde. “We are reaching narrow areas where we are getting very good,” he said. He was formerly CTO at Symantec, and recalls the company using AI to generate new signatures automatically from incoming malware. “Detecting malicious websites and blocking them by reputation was one example where he got quite automated,” he said.

The idea of AI handling our entire cybersecurity operations in a ‘lights out’ scenario is unrealistic he admits. ‘Being able to detect every possible attack and automatically respond to it? No, we’re still a long way from that.”

When dealing with a company claiming to use artificial intelligence and machine learning, it pays to ask them what databases they’re using at the back end, to help you understand whether they’re really analyzing big data or not, Clyde said. Drill down into the analytical model that they’re using. Is it really using machine learning, or is it simply employing smart people to manipulate your data behind the scenes? And can they back up their claims with metrics showing that they outperform human operators – or at least match them, while reducing costs?

“There are definitely solutions out there that are truly using big data and machine learning as a way to do cybersecurity and detect things,” he said. But it pays to approach all claims with some skepticism.



Bookmark and Share