AnomalyHunter is a comprehensive solution for detecting and analyzing anomalies in log files. This project includes services for MySQL database management, anomaly detection using the DBSCAN algorithm, and logging mechanisms to handle and process log data efficiently.
This section describes how to install and set up AnomalyHunter on your server.
- Ubuntu 18.04 or later
- Python 3.6 or later
- MySQL Server
-
Clone the repository:
git clone https://github.com/ashokchokalingam/AnomalyHunter.git cd AnomalyHunter
-
Run the setup script:
sudo bash setup.sh
The setup script will:
- Install MySQL server and secure the installation.
- Create a MySQL database and user.
- Install Python 3 and pip.
- Install required Python packages.
- Initialize the SQL tables by running
Initializer_DB.py
. - Create systemd services for
SQL.py
,dbscan.py
, andlogger.py
. - Set up a cron job to run
truncatesyslog.py
every hour.
After installation, the services will be up and running. You can manage the services using systemd commands:
-
Start a service:
sudo systemctl start <service-name>
-
Stop a service:
sudo systemctl stop <service-name>
-
Check the status of a service:
sudo systemctl status <service-name>
The cron job will automatically run truncatesyslog.py
every hour to truncate logs older than 24 hours.
- Description: Manages the
SQL.py
script. - Dependencies: network.target
- Description: Manages the
dbscan.py
script. - Dependencies: sql.service
- Description: Manages the
logger.py
script. - Dependencies: dbscan.service
The main setup script that installs all necessary components and sets up the services and cron job.
Script to initialize the SQL tables.
- Description:
- Handles SQL-related tasks such as inserting, updating, and querying the database.
- Manages the interaction between the application and the MySQL database, ensuring data is stored and retrieved efficiently.
- Contains functions to initialize SQL tables, ensure columns exist, read and update bookmark files, process log files, and truncate old data.
- Description:
- Performs anomaly detection using the DBSCAN (Density-Based Spatial Clustering of Applications with Noise) algorithm.
- Processes log data to identify clusters of anomalies and helps in detecting unusual patterns that may indicate security incidents or system issues.
- Includes functions to fetch data from the database, preprocess data, run DBSCAN clustering, and update the database with cluster labels.
Script to handle logging of detected anomalies.
Script to truncate log files older than 24 hours.
- Tables:
sigma_alerts
: Stores information about detected alerts.dbscan_outlier
: Stores details of detected outliers.
Table Details:
-
sigma_alerts
:id
: Unique identifier for each alert.title
: Title of the alert.tags
: Tags associated with the alert.description
: Description of the alert.system_time
: Timestamp of the alert.computer_name
: Name of the computer where the alert was generated.user_id
: User ID associated with the alert.event_id
: Event ID of the alert.provider_name
: Name of the provider generating the alert.dbscan_cluster
: Cluster value assigned by the DBSCAN algorithm.raw
: Raw log data.
-
dbscan_outlier
:id
: Unique identifier for each outlier.title
: Title of the outlier.tags
: Tags associated with the outlier.description
: Description of the outlier.system_time
: Timestamp of the outlier.computer_name
: Name of the computer where the outlier was generated.user_id
: User ID associated with the outlier.event_id
: Event ID of the outlier.provider_name
: Name of the provider generating the outlier.dbscan_cluster
: Cluster value assigned by the DBSCAN algorithm.raw
: Raw log data.
- Location:
/var/log/anomalyhunter/
- Log Files:
anomaly.syslog
: Main log file where all anomalies are recorded.
A cron job is set up to run truncatesyslog.py
every hour to maintain the log files by truncating entries older than 24 hours.
Here are some SQL commands you can use to test and inspect the contents of the sigma_alerts
table:
AnomalyHunter is a comprehensive solution for detecting and analyzing anomalies in log files. This project includes services for MySQL database management, anomaly detection using the DBSCAN algorithm, and logging mechanisms to handle and process log data efficiently.
- Installation
- Usage
- Services
- Scripts
- Database and Tables
- Log Files
- Cron Job
- Testing SQL Commands
- Contributing
- License
This section describes how to install and set up AnomalyHunter on your server.
- Ubuntu 18.04 or later
- Python 3.6 or later
- MySQL Server
-
Clone the repository:
git clone https://github.com/ashokchokalingam/AnomalyHunter.git cd AnomalyHunter
-
Run the setup script:
sudo bash setup.sh
The setup script will:
- Install MySQL server and secure the installation.
- Create a MySQL database and user.
- Install Python 3 and pip.
- Install required Python packages.
- Initialize the SQL tables by running
Initializer_DB.py
. - Create systemd services for
SQL.py
,dbscan.py
, andlogger.py
. - Set up a cron job to run
truncatesyslog.py
every hour.
After installation, the services will be up and running. You can manage the services using systemd commands:
-
Start a service:
sudo systemctl start <service-name>
-
Stop a service:
sudo systemctl stop <service-name>
-
Check the status of a service:
sudo systemctl status <service-name>
The cron job will automatically run truncatesyslog.py
every hour to truncate logs older than 24 hours.
- Description: Manages the
SQL.py
script. - Dependencies: network.target
- Description: Manages the
dbscan.py
script. - Dependencies: sql.service
- Description: Manages the
logger.py
script. - Dependencies: dbscan.service
The main setup script that installs all necessary components and sets up the services and cron job.
Script to initialize the SQL tables.
- Description:
- Handles SQL-related tasks such as inserting, updating, and querying the database.
- Manages the interaction between the application and the MySQL database, ensuring data is stored and retrieved efficiently.
- Contains functions to initialize SQL tables, ensure columns exist, read and update bookmark files, process log files, and truncate old data.
- Description:
- Performs anomaly detection using the DBSCAN (Density-Based Spatial Clustering of Applications with Noise) algorithm.
- Processes log data to identify clusters of anomalies and helps in detecting unusual patterns that may indicate security incidents or system issues.
- Includes functions to fetch data from the database, preprocess data, run DBSCAN clustering, and update the database with cluster labels.
Script to handle logging of detected anomalies.
Script to truncate log files older than 24 hours.
- Tables:
sigma_alerts
: Stores information about detected alerts.dbscan_outlier
: Stores details of detected outliers.
Table Details:
-
sigma_alerts
:id
: Unique identifier for each alert.title
: Title of the alert.tags
: Tags associated with the alert.description
: Description of the alert.system_time
: Timestamp of the alert.computer_name
: Name of the computer where the alert was generated.user_id
: User ID associated with the alert.event_id
: Event ID of the alert.provider_name
: Name of the provider generating the alert.dbscan_cluster
: Cluster value assigned by the DBSCAN algorithm.raw
: Raw log data.
-
dbscan_outlier
:id
: Unique identifier for each outlier.title
: Title of the outlier.tags
: Tags associated with the outlier.description
: Description of the outlier.system_time
: Timestamp of the outlier.computer_name
: Name of the computer where the outlier was generated.user_id
: User ID associated with the outlier.event_id
: Event ID of the outlier.provider_name
: Name of the provider generating the outlier.dbscan_cluster
: Cluster value assigned by the DBSCAN algorithm.raw
: Raw log data.
- Location:
/var/log/anomalyhunter/
- Log Files:
anomaly.syslog
: Main log file where all anomalies are recorded.
A cron job is set up to run truncatesyslog.py
every hour to maintain the log files by truncating entries older than 24 hours.
Here are some SQL commands you can use to test and inspect the contents of the sigma_alerts
table:
-
Count All Rows:
SELECT COUNT(*) FROM sigma_alerts;
This command counts the total number of rows in the
sigma_alerts
table. -
Group by Title and Count Occurrences:
SELECT title, COUNT(*) as count FROM sigma_alerts GROUP BY title;
This command groups the rows by the
title
column and counts the occurrences of each title. -
Group by Title and dbscan_cluster Excluding
-1
and Count Occurrences:SELECT title, dbscan_cluster, COUNT(*) AS count FROM sigma_alerts WHERE dbscan_cluster != -1 GROUP BY title, dbscan_cluster;
This command groups the rows by
title
anddbscan_cluster
, excluding rows wheredbscan_cluster
is-1
, and counts the occurrences. -
Group by Title for
dbscan_cluster = -1
and Count Occurrences:SELECT title, COUNT(*) as count FROM sigma_alerts WHERE dbscan_cluster = -1 GROUP BY title;
This command groups the rows by
title
for rows wheredbscan_cluster
is-1
and counts the occurrences. -
**Group by Rand
SELECT id, title, tags, computer_name, user_id, provider_name, dbscan_cluster FROM sigma_alerts ORDER BY RAND() LIMIT 100;
Contributions are welcome! Please fork the repository and create a pull request with your changes. Ensure your code follows the existing style and passes all tests.
This project is licensed under the MIT License. See the LICENSE file for details.