Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New method to load data and new release 0.0.23 #9

Merged
merged 12 commits into from
Dec 4, 2024
Merged

New method to load data and new release 0.0.23 #9

merged 12 commits into from
Dec 4, 2024

Conversation

gaetanbrison
Copy link
Collaborator

@gaetanbrison gaetanbrison commented Dec 4, 2024

Hi Jun 👋

Adding a new method to load all datasets of benchmarks from Hugging Face as follow:

# Import the required dataset loading module from Hugging Face
from datasets import load_dataset

# Load the dataset from the Hugging Face Hub, specifying the source file and the split
dataset = load_dataset('inria-soda/carte-benchmark', data_files='data_raw/michelin.csv', split='train')

# Convert the loaded dataset to a Pandas DataFrame for easier manipulation
df = dataset.to_pandas()

# Import the LabelEncoder from scikit-learn to encode categorical labels
from sklearn.preprocessing import LabelEncoder
encoder = LabelEncoder()

# Encode the target column ("Award") to numerical values for machine learning tasks
df["Award"] = encoder.fit_transform(df["Award"])

# Define the number of training samples/entities and set a random state for reproducibility
num_train = 128  # Number of training groups or entities
random_state = 1  # Random seed for reproducibility
target_name = "Award"  # Target column name
entity_name = "Name"  # Entity column name

# Split the dataset into training and testing sets using a custom function, setting a random seed
X_train, X_test, y_train, y_test = set_split_hf(df, target_name, entity_name, num_train, random_state=42)

@gaetanbrison gaetanbrison merged commit 3812b4d into main Dec 4, 2024
0 of 8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant