Angad is a framework to automate classification of an unlabeled malware dataset using multi-dimensional modelling. The input dataset is analyzed to collect various attributes which are then arranged in several feature vectors. These vectors are individually visualized, indexed and then queried for each new input file. Matching vectors are labelled as per their AV detection categories for now, but this could be changed to a heuristics approach if needed. If dynamic behavior or network traffic details are available, vectors are also converted into activity graphs that depict evolution of activity with a predefined timescale. This results in an animation of malware/malware category’s behavior traits and is also useful in identifying activity overlaps across the input dataset.
Malware detection is a challenging task as the landscape is ever-evolving. Every other day, a new variant or a known malware family is reported and signature driven tools race against time to add detection. The process worsens when the rate of incoming samples is in thousands daily, making static/dynamic analysis alone of no use. Angad tries to address this issue by leveraging well-known data classification techniques to the malware domain. It tries to provide a known interface to the multi-dimensional modelling approach within a standalone package.
Although the demo focuses on PE files primarily, intention is to show how multi-dimensional modelling techniques can be applied to other file formats (PDFs, OLE, etc) that are commonly seen carrying exploits/shellcode.
Attendees will learn how to apply multi-dimensional modelling for malware classification and understand how to use the framework and its APIs to integrate within their own toolset. All material for this talk and demo will be released on opensource and hosted on Github.