This project was done as part of my EPQ over 6 months. I’ve put the abstract and the introduction here to give you a flavour for the project but the rest of the it (with it all formatted nicely in LaTeX) can be found here.

Abstract

As techniques for hiding information (steganography) become increasing more difficult to detect along with much higher resolution carrier images being used, statistical approaches to the detection of steganography are becoming more and more complex. We propose a technique for the detection of the F5 steganographic algorithm by using a simpler set of statistics than previous methods. We do this by classifying the images using a neural network using the simpler statistics as input. We show that, the neural network can successfully detect the presence of the F5 steganographic algorithm regardless of the size of the payload that is encoded in the image and that if the complexity of the statistics is increased, the accuracy increases.

Introduction

Steganography is the art of hiding information inside other information. Notable examples of steganography include technologies such as invisible ink and microdots (both traditionally used by spies). More recently, there was evidence of steganography being used by Russian spies in the US to communicate with their handlers back in Russia [Pincus, 2010]. With the increasing popularity of social media and therefore number of images on the internet, the opportunity to use steganography is only going to increase.

Picture with several types of flowers, used by Richard Murphy to communicate with SVR center. [FBI, 2012]

Neural networks are algorithms that “learn” patterns in data by loosely modelling neurons in a brain, they perform exceedingly well at learning non-linear tasks. Neural networks have become recently very popular starting with AlexNet [Krizhevsky et al., 2012] at the 2012 ImageNet competition [Russakovsky et al., 2015a]. The availability of large datasets and lots of processing power (GPUs) has allowed neural networks to take off since their discovery in 1958 by Frank Rosenblatt.

Due to the fact that steganalysis (the study of discovering steganography) has become more and more statistically complex, it was decided to explore to what extent a more developed classification algorithm (neural networks) could make up for a statistically less complex feature while trying to detect the presence of the F5 steganographic algorithm [Westfeld, 2001a]. If classification algorithms can indeed make up for a less statistically complex feature set, it could mean that these techniques could be used for more difficult steganographic algorithms. This may allow current techniques, with simple classification algorithms, to become more successful when using a complex classification algorithm.

This paper will look at different aspects of the background and mechanisms behind steganography (including the JPEG compression standard), neural networks and steganalysis. Then the method will be described and justified and the data will be analysed. Finally the results and implications of them will be explained.