본문 바로가기
Computer Science/Machine Learning

차원 축소(Dimension Reduction) : PCA (3)

by BaekDaBang 2024. 6. 9.

1. 데이터 로드

from sklearn.datasets import load_wine

data = load_wine()
X = data.data
Y = data.lable

2. 데이터 분할

from sklearn.model_selection import train_test_split

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.3, random_state=1, stratify=Y)

3. 데이터 정규화

from sklearn.preprocessing import StandardScaler

std = StandardScaler()
X_train_std = std.fit_transform(X_train)
X_test_std = std.transform(X_test)

4. 데이터 차원 축소

from sklearn.decomposition import PCA

lpca = PCA(n_components=4)
X_train_pca = lpca.fit_transform(X_train_std)
X_test_pca = lpca.transform(X_test_std)

5. 모델 학습

from sklearn.linear_model import LogisticRegression

lr = LogisticRegression()
lr.fit(X_train_pca, Y_train)
Y_train_pred = lr.predict(X_train_pca)
Y_test_pred = lr.predict(X_test_pca)

from sklearn import metrics
print(metrics.accuracy_score(Y_train, Y_train_pred))
print(metrics.accuracy_score(Y_train, Y_test_pred))
print(metrics.confusion_matrix(Y_test, Y_test_pred))

6. PCA 주축의 수 결정법