Heart Disease UCI

데이터 확인

# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load in 

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list the files in the input directory

import os
print(os.listdir("../input"))

# Any results you write to the current directory are saved as output.

['heart.csv']

pandas 라이브러리를 이용해 'heart.csv'를 메모리로 불러오기

dataset_path = '../input/heart.csv'
dataset = pd.read_csv(dataset_path)
print(dataset.head())

   age  sex  cp  trestbps  chol   ...    oldpeak  slope  ca  thal  target
0   63    1   3       145   233   ...        2.3      0   0     1       1
1   37    1   2       130   250   ...        3.5      0   0     2       1
2   41    0   1       130   204   ...        1.4      2   0     2       1
3   56    1   1       120   236   ...        0.8      2   0     2       1
4   57    0   0       120   354   ...        0.6      2   0     2       1

[5 rows x 14 columns]

X와 y나누기

모든 값이 바이너리이거나 숫자형이므로 Cateogorization을 할 필요가 없다. X와 y로 데이터셋와 타겟을 나눈다.

X_len = len(dataset.columns) - 1
y_len = len(dataset.columns) - 1
X = dataset.iloc[:, 0:X_len]
y = dataset.iloc[:, y_len]

데이터셋을 트레이닝셋과 테스트셋으로 나눈다. 비율은 8 : 2로 한다.

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

Feature Scaling

from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

/opt/conda/lib/python3.6/site-packages/sklearn/preprocessing/data.py:645: DataConversionWarning: Data with input dtype int64, float64 were all converted to float64 by StandardScaler.
  return self.partial_fit(X, y)
/opt/conda/lib/python3.6/site-packages/sklearn/base.py:464: DataConversionWarning: Data with input dtype int64, float64 were all converted to float64 by StandardScaler.
  return self.fit(X, **fit_params).transform(X)
/opt/conda/lib/python3.6/site-packages/ipykernel_launcher.py:4: DataConversionWarning: Data with input dtype int64, float64 were all converted to float64 by StandardScaler.
  after removing the cwd from sys.path.

인공 신경망 만들기

3개의 레이어를 두었다. 유닛과 배치사이즈, 에폭은 실험을 걸쳐 가장 잘 나오는 것으로 했다.

# Importing the Keras libraries and packages
import keras
from keras.models import Sequential
from keras.layers import Dense
# Initialising the ANN
classifier = Sequential()
# Adding the input layer and the first hidden layer
classifier.add(Dense(units = 25, kernel_initializer = 'uniform', activation = 'relu', input_dim = 13))
# Adding the second hidden layer
classifier.add(Dense(units = 15, kernel_initializer = 'uniform', activation = 'relu'))
# Adding the output layer
classifier.add(Dense(units = 1, kernel_initializer = 'uniform', activation = 'sigmoid'))
# Compiling the ANN
classifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])
# Fitting the ANN to the Training set
classifier.fit(X_train, y_train, batch_size = 40, epochs = 200)

Using TensorFlow backend.

WARNING:tensorflow:From /opt/conda/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
WARNING:tensorflow:From /opt/conda/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
Epoch 1/200
242/242 [==============================] - 1s 3ms/step - loss: 0.6928 - acc: 0.6405
Epoch 2/200
242/242 [==============================] - 0s 44us/step - loss: 0.6921 - acc: 0.7810
Epoch 3/200
242/242 [==============================] - 0s 44us/step - loss: 0.6909 - acc: 0.8264
Epoch 4/200
242/242 [==============================] - 0s 42us/step - loss: 0.6889 - acc: 0.8182
Epoch 5/200
242/242 [==============================] - 0s 43us/step - loss: 0.6857 - acc: 0.8388
Epoch 6/200
242/242 [==============================] - 0s 42us/step - loss: 0.6806 - acc: 0.8306
... (생략)
Epoch 196/200
242/242 [==============================] - 0s 45us/step - loss: 0.2321 - acc: 0.9132
Epoch 197/200
242/242 [==============================] - 0s 41us/step - loss: 0.2310 - acc: 0.9132
Epoch 198/200
242/242 [==============================] - 0s 47us/step - loss: 0.2300 - acc: 0.9132
Epoch 199/200
242/242 [==============================] - 0s 44us/step - loss: 0.2291 - acc: 0.9132
Epoch 200/200
242/242 [==============================] - 0s 47us/step - loss: 0.2282 - acc: 0.9174

<keras.callbacks.History at 0x7f5dbd208f28>

모델로 y를 예측

y_pred = classifier.predict(X_test)
y_pred = (y_pred > 0.5)

Confusion Matrix

예측된 실제 y에 얼마나 차이가 있는지 확인해 본다.

from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
loss, accuracy = classifier.evaluate(X_test, y_test)

61/61 [==============================] - 0s 678us/step

Confusion Matrix, loss and accuracy 확인하기

print('Loss : ', loss)
print('Accuracy : ', accuracy)
print('Confusion Matrix : \n', cm)

Loss :  0.3430554387999363
Accuracy :  0.9016393550106736
Confusion Matrix : 
 [[23  4]
 [ 2 32]]

티스토리

딥러닝(Deep Learning) 인공신경망(Artificial Neural Network)을 이용한 심장병 예측

딥러닝(Deep Learning) 인공신경망(Artificial Neural Network)을 이용한 심장병 예측

Heart Disease UCI

데이터 확인

pandas 라이브러리를 이용해 'heart.csv'를 메모리로 불러오기

X와 y나누기

Feature Scaling

인공 신경망 만들기

모델로 y를 예측

Confusion Matrix

Confusion Matrix, loss and accuracy 확인하기