The present exponential growth in computer and internet development has brought in numerous cybersecurity challenges which are constantly changing with time. The current cybersecurity solutions are no longer optimal in tackling these emerging cyber threats and attacks. This paper proposes the performance evaluation of the available machine learning algorithms as well as the generation of a cybersecurity dataset to be used for machine learning (ML) approaches of supervised and unsupervised learning for an effective intrusion detection system. The proposed model entails a six-stage process which starts with the setup of the network infrastructure environment to generate the dataset required for the modelling, the generated data then feeds into the data preprocessing stage and then to data cleaning stage to remove error and incomplete entries. This in turn feeds into the data transformation stage where symbolic features are represented numerically and this feeds into the data normalization phase which provides a means to spread the data across its highest values after which feature reduction is done to reduce the dataset to the features relevant for modelling. Classification using several ML algorithms is carried out using three options on the dataset. The proposed model is expected to provide a high-quality dataset and an efficient intrusion detection system in terms of minimum intrusion detection accuracy, a short train time and a low false-positive rate which would give the basis for comparison.
Author: Maxwell Eichie