노트북 번역

ruke79 · Jul 31, 2021 · 1a49c6d · 1a49c6d
1 parent 863f92f
commit 1a49c6d
Showing 1 changed file with 11 additions and 11 deletions.
diff --git a/notebooks/splitting_data.ipynb b/notebooks/splitting_data.ipynb
@@ -4,11 +4,11 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Splitting data\n",
+    "# 데이터 분할\n",
     "\n",
-    "In the previous [notebook](https://github.com/hundredblocks/ml-powered-applications/blob/master/notebooks/dataset_exploration.ipynb), we explored a dataset. Next, we will separate it into a training and test split. Separating a dataset into splits is crucial to validate the performance of a model. By using only a subset of data to train a model, you can use unseen data to produce an estimate of how your model would perform in practice.\n",
+    "이전 [노트북](https://github.com/rickiepark/ml-powered-applications/blob/master/notebooks/dataset_exploration.ipynb)에서 데이터셋을 탐색했습니다. 이제 이 데이터셋을 훈련 세트와 테스트 세트로 분할하겠습니다. 데이터셋을 분할하는 것은 모델의 성능을 검증하는데 매우 중요합니다. 일부 데이터로만 모델을 훈련하고 모델의 실전 성능이 얼마나 될지 추정하기 위해 본 적 없는 데이터를 사용할 수 있습니다.\n",
     "\n",
-    "In this notebook, I will demonstrate a few ways to do just that using the `writers` Stack Overflow dataset. First, we load and format data."
+    "이 노트북에서 `writer` 스택 오버플로 데이터셋을 사용해 몇 가지 분할 방법을 소개합니다. 먼저 데이터를 로드하고 전처리합니다."
    ]
   },
   {
@@ -37,9 +37,9 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Random Split\n",
+    "## 랜덤 분할\n",
     "\n",
-    "The simplest way to generate a test set is to randomly split data between a training set and a test set. This is what we do below."
+    "테스트 세트를 만드는 가장 간단한 방법은 랜덤하게 데이터를 훈련 세트와 테스트 세트로 나누는 것입니다. 방식은 다음과 같습니다."
    ]
   },
   {
@@ -80,20 +80,20 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "This approach comes with one drawback, can you guess what it is before moving on to the next section"
+    "이 방식은 한 가지 단점이 있습니다. 다음 섹션으로 넘어가기 전에 이 단점이 무엇인지 생각해 보세요."
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Author Split\n",
+    "## 작성자를 기준으로 분할하기\n",
     "\n",
-    "Some authors may be more skilled at asking questions then others. If an author appears in both the training and the test set, a model could successfully predict the performance of their questions simply by successfully identifying the author. Note that simply removing the `AuthorId` from the set of features does not fully solve this problem, as the formulation of a question may be author specific (especially if some authors include their signature). \n",
+    "어떤 작성자는 다른 사람보다 질문을 작성하는데 더 뛰어날 수 있습니다. 한 작성자가 훈련 세트와 테스트 세트에 모두 등장하면 모델이 작성자를 인식하여 간단히 질문의 점수를 예측할 수 있습니다. 단순하게 특성에서 `AuthorId`를 삭제한느 것은 이 문제를 완전히 해결하지 못합니다. 질문에 저자의 특징이 포함되어 있을 수 있기 때문입니다(특히 일부 작성자는 자기 사인을 질문에 포함시킵니다).\n",
     "\n",
-    "To make sure we are accurately judging question quality, we would want to make sure that a given author only appears in either the training set or the validation set. This guarantee that a model will not be able to leverage information to identify a given author and use it to predict more easily.\n",
+    "질문의 품질을 정확하게 판단하기 위해서 한 작성자는 훈련 세트나 검즈 세트 하나에만 등장해야 합니다. 이를 통해 모델이 저자를 식별할 수 있는 정보를 사용하여 쉽게 예측을 하지 못하게 만들 수 있습니다.\n",
     "\n",
-    "To remove this potential source of bias, let's split data by author."
+    "이런 잠재적인 편향의 원인을 제거하기 위해 작성자를 기준으로 분할하겠습니다."
    ]
   },
   {
@@ -129,7 +129,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Going forward we will use the author split, but there are other methods of splitting data for other types of data. For example, we may want to use a time-based split in order to see whether training on questions written in a given period can produce a model that works well on questions from a more recent period. Please refer to the book for more information on those."
+    "여기서는 작성자를 기준으로 분할하지만 다른 유형의 데이터를 위한 분할 방법이 여러 가지가 있습니다. 예를 들어 어떤 기간 동안 쓰여진 질문에서 훈련하면 최근에 질문에 잘 동작하는 모델을 만들 수 있는지 확인하기 위해 시간 기분으로 분할할 수 있습니다. 더 자세한 내용은 책을 참고하세요."
    ]
   }
  ],