본문 바로가기
Computer Science/Machine Learning

KNN(K-Nearest Neighbors) [재배환경 별 작물 종류 예측] (4)

by BaekDaBang 2024. 3. 23.

0. Dataset

import numpy as np
import pandas as pd

train_data = pd.read_csv("/kaggle/input/2023-ml-w4p1/train.csv")
test_data = pd.read_csv("/kaggle/input/2023-ml-w4p1/test.csv")
x_train = train_data.iloc[:,:-1]
y_train = train_data.iloc[:,-1]
x_test = test_data

 

Train Data

 

Test Data

 

1. Label map 

label_map = list()

for label in y_train:
    if label not in label_map:
        label_map.append(label)
y_train_label = list()

for pred in y_train:
    for n,label in enumerate(label_map):
        if pred is label:
            y_train_label.append(n)
            break

y_train = y_train_label

 

2. Data Normalization

from sklearn.preprocessing import StandardScaler

std = StandardScaler()

x_train = std.fit_transform(x_train)
x_test = std.transform(x_test)

 

3. Inference

from sklearn.neighbors import KNeighborsClassifier

knn = KNeighborsClassifier()
knn.fit(x_train, y_train)

y_test = list()
y_test_label = knn.predict(x_test)

for pred in y_test_label:
    y_test.append(label_map[int(pred)])

 

4. Submission

submission = pd.read_csv('/kaggle/input/2023-ml-w4p1/sample_submit.csv')
submission['label'] = y_test
submission.to_csv('submission.csv',index = False)