Brand impersonation is a key attack strategy in which a malicious user crafts content to look like a known brand to deceive a user into entering sensitive information, such as account passwords or credit card details.
To address this issue, we developed and trained a Siamese Neural Network on labeled images to detect brand impersonation. Specifically, our dataset consists of over 50,000 screenshots of known malicious log-in pages encompassing over 1000 brand impersonations. Our Siamese network learns to embed images of the same brand relatively close together in a low dimensional space while images of different brands are embedded further apart. We then perform a nearest neighbor classification in the embedded space.
To present the results and fully characterize the performance of our Siamese network, we developed metrics that capture how well the Siamese network performs on known as well as previously unseen brands and show how the network outperforms a baseline image hashing algorithm on a held-out training set. We will then discuss further applications and planned enhancements to the baseline machine learning model.