Skip to content

Commit

Permalink
하이퍼링크 수정
Browse files Browse the repository at this point in the history
  • Loading branch information
rickiepark committed Jul 31, 2021
1 parent 30881c3 commit 6b557e8
Showing 1 changed file with 23 additions and 24 deletions.
47 changes: 23 additions & 24 deletions notebooks/second_model.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Training a second model\n",
"# 두 번째 모델 훈련하기\n",
"\n",
"In this notebook, I train a second model using features in order to address the first model's shortcomings."
"이 노트북에서 첫 번째 모델의 단점을 극복하기 위한 특성을 사용해 두 번째 모델을 훈련합니다."
]
},
{
Expand Down Expand Up @@ -81,7 +81,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's add new features we've identified as potential candidates in our new model."
"새로운 모델에 도움이 될만한 후보 특성을 추가합니다."
]
},
{
Expand All @@ -107,7 +107,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Check out the ml_editor source code to see more about what these functions are doing!"
"`ml_editor` 소스 코드를 확인하여 이 함수들의 기능을 확인해 보세요!"
]
},
{
Expand Down Expand Up @@ -157,9 +157,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Model\n",
"# 모델\n",
"\n",
"Now that we've added new features, let's train a new model. We'll use the same model as before, only the features are different."
"이제 새로운 특성을 추가했으니 새 모델을 훈련해 보죠. 특성만 다르고 이전과 동일한 모델을 사용합니다."
]
},
{
Expand All @@ -168,7 +168,7 @@
"metadata": {},
"outputs": [],
"source": [
"# We split again since we have now added all features. \n",
"# 특성을 새로 추가했으므로 데이터셋을 다시 나눕니다.\n",
"train_df, test_df = get_split_by_author(df, test_size=0.2, random_state=40)"
]
},
Expand Down Expand Up @@ -259,28 +259,27 @@
],
"source": [
"def get_metrics(y_test, y_predicted): \n",
" # true positives / (true positives+false positives)\n",
" # 진짜 양성 / (진짜 양성 + 가짜 양성)\n",
" precision = precision_score(y_test, y_predicted, pos_label=True,\n",
" average='binary') \n",
" # true positives / (true positives + false negatives)\n",
" # 진짜 양성 / (진짜 양성 + 가짜 음성)\n",
" recall = recall_score(y_test, y_predicted, pos_label=True,\n",
" average='binary')\n",
" \n",
" # harmonic mean of precision and recall\n",
" # 정밀도와 재현율의 조화 평균\n",
" f1 = f1_score(y_test, y_predicted, pos_label=True, average='binary')\n",
" \n",
" # true positives + true negatives/ total\n",
" # 진짜 양성 + 진짜 음성 / 전체\n",
" accuracy = accuracy_score(y_test, y_predicted)\n",
" return accuracy, precision, recall, f1\n",
"\n",
"\n",
"\n",
"# Training accuracy\n",
"# Thanks to https://datascience.stackexchange.com/questions/13151/randomforestclassifier-oob-scoring-method\n",
"# 훈련 정확도\n",
"# https://datascience.stackexchange.com/questions/13151/randomforestclassifier-oob-scoring-method 참고\n",
"y_train_pred = np.argmax(clf.oob_decision_function_,axis=1)\n",
"\n",
"accuracy, precision, recall, f1 = get_metrics(y_train, y_train_pred)\n",
"print(\"Training accuracy = %.3f, precision = %.3f, recall = %.3f, f1 = %.3f\" % (accuracy, precision, recall, f1))"
"print(\"훈련 정확도 = %.3f, 정밀도 = %.3f, recall = %.3f, f1 = %.3f\" % (accuracy, precision, recall, f1))"
]
},
{
Expand All @@ -298,14 +297,14 @@
],
"source": [
"accuracy, precision, recall, f1 = get_metrics(y_test, y_predicted)\n",
"print(\"Validation accuracy = %.3f, precision = %.3f, recall = %.3f, f1 = %.3f\" % (accuracy, precision, recall, f1))"
"print(\"검증 정확도 = %.3f, 정밀도 = %.3f, recall = %.3f, f1 = %.3f\" % (accuracy, precision, recall, f1))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Fortunately, this model shows stronger aggregate performance than our previous model! Let's save our new model and vectorizer to disk so we can use them later."
"다행히 이 모델은 이전 모델보다 성능이 더 높습니다! 새로운 모델과 벡터화 객체를 나중에 사용하기 위해 디스크에 저장하겠습니다."
]
},
{
Expand Down Expand Up @@ -335,9 +334,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Validating that features are useful\n",
"## 특성의 유용성 검증하기\n",
"\n",
"Next, we'll use the method described in the feature importance [notebook](https://github.com/hundredblocks/ml-powered-applications/blob/master/notebooks/feature_importance.ipynb) to validate that our new features are being used by the new model."
"그다음 특성 중요도 [노트북](https://github.com/hundredblocks/ml-powered-applications/blob/master/notebooks/feature_importance.ipynb)에서 설명한 방법을 사용해 새로운 특성을 새로운 모델이 사용하는지 확인해 보겠습니다."
]
},
{
Expand Down Expand Up @@ -412,27 +411,27 @@
],
"source": [
"k = 20\n",
"print(\"Top %s importances:\\n\" % k)\n",
"print(\"상위 %s개 중요도:\\n\" % k)\n",
"print('\\n'.join([\"%s: %.2g\" % (tup[0], tup[1]) for tup in get_feature_importance(clf, all_feature_names)[:k]]))\n",
"\n",
"print(\"\\nBottom %s importances:\\n\" % k)\n",
"print(\"\\n하위 %s개 중요도:\\n\" % k)\n",
"print('\\n'.join([\"%s: %.2g\" % (tup[0], tup[1]) for tup in get_feature_importance(clf, all_feature_names)[-k:]]))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Our new features are amongst the most predictive! On the flip side, we can see that the word vectors from the TF-IDF vectorization approach don't seem to be particularly helpful. In a following [notebook](https://github.com/hundredblocks/ml-powered-applications/blob/master/notebooks/third_model.ipynb), we will train a third model without these features and see how well it performs."
"새로운 특성이 가장 예측 성능이 좋은 편이군요! 반대로 TF-IDF 벡터화로 얻은 단어 벡터는 특별히 도움이 되는 것 같지 않습니다. 이어지는 [노트북](https://github.com/hundredblocks/ml-powered-applications/blob/master/notebooks/third_model.ipynb)에서 이런 특성을 제외하고 세 번째 모델을 훈련하여 어떤 성능을 내는지 확인해 보습니다."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Comparing predictions to data\n",
"## 예측과 데이터 비교하기\n",
"\n",
"This section uses the evaluation methods described in the Comparing Data To Predictions [notebook](https://github.com/hundredblocks/ml-powered-applications/blob/master/notebooks/comparing_data_to_predictions.ipynb), but on our new model."
"이 섹션은 새로운 모델로 데이터와 예측 비교하기 [노트북](comparing_data_to_predictions.ipynb)에서 설명한 평가 방법을 사용합니다."
]
},
{
Expand Down

0 comments on commit 6b557e8

Please sign in to comment.