Merging recomm_sys #72

akkadhim · 2024-12-25T17:51:26Z

Adding the recommendation system experiments. Please ignore any changes outside the (examples/recomm_system) directory.

BooBSD · 2024-12-26T18:20:03Z

@akkadhim Could you please export your noisy datasets to a CSV file for testing in other languages?

akkadhim · 2024-12-27T12:00:29Z

@akkadhim Could you please export your noisy datasets to a CSV file for testing in other languages?

Sure, below are different datasets for different noise ratios.

noisy_dataset_0.05.csv
noisy_dataset_0.005.csv
noisy_dataset_0.02.csv
noisy_dataset_0.2.csv
noisy_dataset_0.01.csv
noisy_dataset_0.1.csv

BooBSD · 2024-12-27T13:10:34Z

@akkadhim Thank you!

BooBSD · 2024-12-27T14:26:26Z

@akkadhim Is it correct that, after one-hot booleanization, your input data consists of 10709 bits? This includes 1350 unique product_ids + 317 categories + 9042 user_ids.

akkadhim · 2024-12-27T15:22:37Z

@akkadhim Is it correct that, after one-hot booleanization, your input data consists of 10709 bits? This includes 1350 unique product_ids + 317 categories + 9042 user_ids.
After expanding the original dataset and adding the noise, the unique features will be:
Users: 1193
Items: 1350
Categories: 211
I used the one_hot_encoding for the TM classifier, and at that step, the dataset split to train and test portions.

BooBSD · 2024-12-27T15:36:56Z

@akkadhim
Got it. However, the columns category and user_id contain lists of categories and users, joined by the "|" and "," characters (for example: "Computers&Accessories|Accessories&Peripherals|Cables&Accessories|Cables|USBCables" or "AH4BURHCF5UQFZR4VJQXBEQCTYVQ,AGSJLPK6HU2FB4HII64NQ3OYFFFA,AGG75KFRXNLCYVRAPA6D4ZBNTNSA"). Why weren’t they split into individual unique categories and user IDs? Could you confirm if your method of booleanization is correct?

BooBSD · 2024-12-27T17:33:17Z

@akkadhim
I tested both booleanization methods (yours and mine) and obtained approximately the same validation accuracy.
I split your dataset such that the first 80% is used for training, and the last 20% for validation.

My best validation accuracy:

noisy_dataset_0.005.csv: 99.73%
noisy_dataset_0.2.csv: 84.87%

Here is the proof:

#1  Accuracy: 83.81%  Best: 83.81%  Training: 1.946s  Testing: 0.107s
#2  Accuracy: 96.69%  Best: 96.69%  Training: 0.609s  Testing: 0.009s
#3  Accuracy: 99.69%  Best: 99.69%  Training: 0.442s  Testing: 0.008s
#4  Accuracy: 99.69%  Best: 99.69%  Training: 0.350s  Testing: 0.007s
#5  Accuracy: 99.69%  Best: 99.69%  Training: 0.279s  Testing: 0.007s
#6  Accuracy: 99.69%  Best: 99.69%  Training: 0.238s  Testing: 0.006s
#7  Accuracy: 99.69%  Best: 99.69%  Training: 0.192s  Testing: 0.006s
#8  Accuracy: 99.69%  Best: 99.69%  Training: 0.178s  Testing: 0.006s
#9  Accuracy: 99.69%  Best: 99.69%  Training: 0.173s  Testing: 0.006s
#10  Accuracy: 99.69%  Best: 99.69%  Training: 0.147s  Testing: 0.005s
....
#300  Accuracy: 99.73%  Best: 99.73%  Training: 0.085s  Testing: 0.003s
#301  Accuracy: 99.69%  Best: 99.73%  Training: 0.090s  Testing: 0.003s
#302  Accuracy: 99.73%  Best: 99.73%  Training: 0.086s  Testing: 0.003s
#303  Accuracy: 99.73%  Best: 99.73%  Training: 0.084s  Testing: 0.003s
#304  Accuracy: 99.73%  Best: 99.73%  Training: 0.081s  Testing: 0.003s
#305  Accuracy: 99.73%  Best: 99.73%  Training: 0.089s  Testing: 0.003s
#306  Accuracy: 99.73%  Best: 99.73%  Training: 0.080s  Testing: 0.003s
#307  Accuracy: 99.73%  Best: 99.73%  Training: 0.081s  Testing: 0.003s
#308  Accuracy: 99.73%  Best: 99.73%  Training: 0.089s  Testing: 0.003s
#309  Accuracy: 99.73%  Best: 99.73%  Training: 0.088s  Testing: 0.003s
#310  Accuracy: 99.69%  Best: 99.73%  Training: 0.083s  Testing: 0.003s
#311  Accuracy: 99.69%  Best: 99.73%  Training: 0.081s  Testing: 0.003s
#312  Accuracy: 99.73%  Best: 99.73%  Training: 0.082s  Testing: 0.003s
#313  Accuracy: 99.73%  Best: 99.73%  Training: 0.079s  Testing: 0.003s
#314  Accuracy: 99.69%  Best: 99.73%  Training: 0.081s  Testing: 0.003s
#315  Accuracy: 99.73%  Best: 99.73%  Training: 0.083s  Testing: 0.003s
#316  Accuracy: 99.73%  Best: 99.73%  Training: 0.088s  Testing: 0.003s
#317  Accuracy: 99.73%  Best: 99.73%  Training: 0.085s  Testing: 0.003s
#318  Accuracy: 99.73%  Best: 99.73%  Training: 0.086s  Testing: 0.003s
#319  Accuracy: 99.73%  Best: 99.73%  Training: 0.088s  Testing: 0.003s
#320  Accuracy: 99.73%  Best: 99.73%  Training: 0.091s  Testing: 0.003s

These results were obtained on a CPU, and it works quite fast.

akkadhim · 2024-12-28T12:19:11Z

@akkadhim Got it. However, the columns category and user_id contain lists of categories and users, joined by the "|" and "," characters (for example: "Computers&Accessories|Accessories&Peripherals|Cables&Accessories|Cables|USBCables" or "AH4BURHCF5UQFZR4VJQXBEQCTYVQ,AGSJLPK6HU2FB4HII64NQ3OYFFFA,AGG75KFRXNLCYVRAPA6D4ZBNTNSA"). Why weren’t they split into individual unique categories and user IDs? Could you confirm if your method of booleanization is correct?

For the user_id, the CSV formatting rules allow handling such cases by enclosing the value in double quotes, while the categories column format maintains the original structure of the dataset. Splitting these fields would alter the representation of hierarchical categories and associated user IDs.
Yes, it is correct.

akkadhim · 2024-12-28T12:22:31Z

@akkadhim I tested both booleanization methods (yours and mine) and obtained approximately the same validation accuracy. I split your dataset such that the first 80% is used for training, and the last 20% for validation.

My best validation accuracy:

noisy_dataset_0.005.csv: 99.73%
noisy_dataset_0.2.csv: 84.87%

Here is the proof:

#1  Accuracy: 83.81%  Best: 83.81%  Training: 1.946s  Testing: 0.107s
#2  Accuracy: 96.69%  Best: 96.69%  Training: 0.609s  Testing: 0.009s
#3  Accuracy: 99.69%  Best: 99.69%  Training: 0.442s  Testing: 0.008s
#4  Accuracy: 99.69%  Best: 99.69%  Training: 0.350s  Testing: 0.007s
#5  Accuracy: 99.69%  Best: 99.69%  Training: 0.279s  Testing: 0.007s
#6  Accuracy: 99.69%  Best: 99.69%  Training: 0.238s  Testing: 0.006s
#7  Accuracy: 99.69%  Best: 99.69%  Training: 0.192s  Testing: 0.006s
#8  Accuracy: 99.69%  Best: 99.69%  Training: 0.178s  Testing: 0.006s
#9  Accuracy: 99.69%  Best: 99.69%  Training: 0.173s  Testing: 0.006s
#10  Accuracy: 99.69%  Best: 99.69%  Training: 0.147s  Testing: 0.005s
....
#300  Accuracy: 99.73%  Best: 99.73%  Training: 0.085s  Testing: 0.003s
#301  Accuracy: 99.69%  Best: 99.73%  Training: 0.090s  Testing: 0.003s
#302  Accuracy: 99.73%  Best: 99.73%  Training: 0.086s  Testing: 0.003s
#303  Accuracy: 99.73%  Best: 99.73%  Training: 0.084s  Testing: 0.003s
#304  Accuracy: 99.73%  Best: 99.73%  Training: 0.081s  Testing: 0.003s
#305  Accuracy: 99.73%  Best: 99.73%  Training: 0.089s  Testing: 0.003s
#306  Accuracy: 99.73%  Best: 99.73%  Training: 0.080s  Testing: 0.003s
#307  Accuracy: 99.73%  Best: 99.73%  Training: 0.081s  Testing: 0.003s
#308  Accuracy: 99.73%  Best: 99.73%  Training: 0.089s  Testing: 0.003s
#309  Accuracy: 99.73%  Best: 99.73%  Training: 0.088s  Testing: 0.003s
#310  Accuracy: 99.69%  Best: 99.73%  Training: 0.083s  Testing: 0.003s
#311  Accuracy: 99.69%  Best: 99.73%  Training: 0.081s  Testing: 0.003s
#312  Accuracy: 99.73%  Best: 99.73%  Training: 0.082s  Testing: 0.003s
#313  Accuracy: 99.73%  Best: 99.73%  Training: 0.079s  Testing: 0.003s
#314  Accuracy: 99.69%  Best: 99.73%  Training: 0.081s  Testing: 0.003s
#315  Accuracy: 99.73%  Best: 99.73%  Training: 0.083s  Testing: 0.003s
#316  Accuracy: 99.73%  Best: 99.73%  Training: 0.088s  Testing: 0.003s
#317  Accuracy: 99.73%  Best: 99.73%  Training: 0.085s  Testing: 0.003s
#318  Accuracy: 99.73%  Best: 99.73%  Training: 0.086s  Testing: 0.003s
#319  Accuracy: 99.73%  Best: 99.73%  Training: 0.088s  Testing: 0.003s
#320  Accuracy: 99.73%  Best: 99.73%  Training: 0.091s  Testing: 0.003s

These results were obtained on a CPU, and it works quite fast.

Very impressive! Nice work, @BooBSD!

akkadhim added 23 commits November 4, 2024 10:55

add recom sys

aa06920

rename

6280bfb

complete recom sys

771edcf

rename

ec3fc87

tunning

08693ab

update

9c4be1f

update

daf8d5a

update

9dacba5

run on gpu 6

3dd2b7c

add requirments

e9bdcd6

update

da31b30

expanded ds

fababa5

update

82305ab

before add example no

218a96f

orgnizing files

799493f

update

801b7e3

add TMClassifier

fdfb81f

update

3168dc7

add graph nn

e2232de

add main.bash

c454631

add results

84d8012

Merge branch 'master' into master

347a43c

fair comparisons

d68ae71

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merging recomm_sys #72

Merging recomm_sys #72

akkadhim commented Dec 25, 2024 •

edited

Loading

BooBSD commented Dec 26, 2024

akkadhim commented Dec 27, 2024

BooBSD commented Dec 27, 2024

BooBSD commented Dec 27, 2024

akkadhim commented Dec 27, 2024

BooBSD commented Dec 27, 2024 •

edited

Loading

BooBSD commented Dec 27, 2024 •

edited

Loading

akkadhim commented Dec 28, 2024

akkadhim commented Dec 28, 2024

Merging recomm_sys #72

Are you sure you want to change the base?

Merging recomm_sys #72

Conversation

akkadhim commented Dec 25, 2024 • edited Loading

BooBSD commented Dec 26, 2024

akkadhim commented Dec 27, 2024

BooBSD commented Dec 27, 2024

BooBSD commented Dec 27, 2024

akkadhim commented Dec 27, 2024

BooBSD commented Dec 27, 2024 • edited Loading

BooBSD commented Dec 27, 2024 • edited Loading

akkadhim commented Dec 28, 2024

akkadhim commented Dec 28, 2024

akkadhim commented Dec 25, 2024 •

edited

Loading

BooBSD commented Dec 27, 2024 •

edited

Loading

BooBSD commented Dec 27, 2024 •

edited

Loading