CONFUSION MATRIX IN CYBERCRIME

Utkarsh Srivastava
4 min readJun 7, 2021

What is Cybercrime?

When we hear any term, the first question that comes to our mind is what it would be? With the rapid increase of digitalization, crime related to it increases. Innumerable hackers are waiting to grab any minute fault that happens in our architecture that would come to their use. Cybercrime is a criminal activity that either targets or uses a computer, a computer network, or a networked device. The hackers ask for ransom with the exchange of the most vulnerable data for the company — they procured by hacking the system. Some cybercriminals are organized, use advanced techniques, and are highly technically skilled — it would become very hard to suspect if somebody is scrutinizing your computer.

Cybercrime has opted as a crime almost across all the countries in the world, if caught, he/she would be sent to jail and some amount of penalty.

Examples of Cybercrime —

  • Email and internet fraud.
  • Identity fraud.
  • Theft of financial or card payment data.
  • Theft and sale of corporate data.
  • Cyberextortion (demanding money to prevent a threatened attack).
  • Ransomware attacks (a type of cyber extortion).
  • Cryptojacking (where hackers mine cryptocurrency using resources they do not own).
  • Cyberespionage (where hackers access government or company data).

What is a confusion matrix?

The confusion matrix was invented in 1904 by Karl Pearson. He used the term Contingency Table. A confusion matrix is a performance measurement technique for Machine learning classification problems.

A confusion matrix is a table that outlines different predictions and test results and contrasts them with real-world values. Confusion matrices are used in statistics, data mining, machine learning models, and other artificial intelligence (AI) applications. A confusion matrix can also be called an error matrix. Confusion matrices are used to make the in-depth analysis of statistical data faster and the results easier to read through clear data visualization. The tables can help analyze faults in statistics, data mining, forensics, and medical tests. A thorough analysis helps users decide what results indicate how errors are made rather than merely assessing performance.

Confusion Matrix’s implementation in monitoring Cyber Attacks:

The data set used for The Third International Knowledge Discovery and Data Mining Tools Competition, which was held in conjunction with KDD-99 The Fifth International Conference on Knowledge Discovery and Data Mining. The competition task was to build a network intrusion detector, a predictive model capable of distinguishing between ``bad’’ connections, called intrusions or attacks, and ``good’’ normal connections. This database contains a standard set of data to be audited, which includes a wide variety of intrusions simulated in a military network environment.
In the KDD99 dataset these four attack classes (DoS, U2R, R2L, and probe) are divided into 22 different attack classes that tabulated below:

In the KDD Cup 99, the criteria used for evaluation of the participant entries is the Cost Per Test
(CPT) computed using the confusion matrix and a given cost matrix.
True Positive (TP): The amount of attack detected when it is actually attacked.
True Negative (TN): The amount of normal detected when it is actually normal.
False Positive (FP): The amount of attack detected when it is actually normal (False alarm).
False Negative (FN): The amount of normal detected when it is actually attacked.

Conclusion:

A confusion matrix is a tabular summary of the number of correct and incorrect predictions made by a classifier. It is used to measure the performance of a classification model. It can be used to evaluate the performance of a classification model through the calculation of
performance metrics like accuracy, precision, recall, and F1-score.

Need for Confusion Matrix in Machine learning:
It evaluates the performance of the classification models, when they make predictions on test data, and tells how good our classification model is.
It not only tells the error made by the classifiers but also the type of errors such as it is either type-I or type-II error.
With the help of the confusion matrix, we can calculate the different parameters for the model, such as accuracy, precision, etc.

The confusion matrix is a matrix used to determine the performance of the classification models for a given set of test data. It can only be determined if the true values for test data are known. The matrix itself can be easily understood and implemented to test an ML model.

SOURCE:-

https://www.reddit.com/r/CyberAttacks/comments/nrznnj/what_is_confusion_matrix_and_its_implementation/

Thank you for your read!

--

--