Natural Language Processing (Final Exam - 202
4)
1. Which of the following is NOT a direct application of SVD in data analysis?**
A) Image compression
B) Noise reduction
C) Matrix inversion
D) Data encryption
2. In SVD, a matrix A is decomposed into three matrices U, Sigma, and V T. What does the
matrix Sigma represent?
A) The original matrix A
B) The left singular vectors
C) The singular values
D) The right singular vectors
3. How can SVD be used to identify the most important features in a dataset?**
A) By analyzing the matrix U
B) By looking at the largest singular values in Sigma
C) By transforming the dataset using VT
D) By reconstructing the dataset with all singular values
4. What role does SVD play in the context of a recommender system?
A) It clusters users into different groups
B) It predicts user preferences by decomposing the user-item interaction
matrix
C) It creates new items based on user preferences
D) It increases the number of recommendations a user receives
5. Which matrix in the SVD decomposition of a user-movie interaction matrix represents t
he latent factors for items?
A) U
B) Sigma
C) VT
D) None of the above
6. How does SVD improve the performance of a recommender system?
A) By directly using user ratings
B) By approximating the user-item interaction matrix to predict missing va
lues
C) By ignoring user-item interactions
D) By increasing the dimensionality of the dataset
7. In which scenario would you most likely apply SVD in both data analysis and a recomm
ender system?
A) When dealing with large-scale text data for sentiment analysis
B) When analyzing social network graphs for community detection
C) When performing collaborative analysis on a user-item matrix to recom
mend movies
D) When compressing audio files for better storage
8. Which of the following steps is necessary to apply SVD for a recommender system?
A) Normalizing the user-item matrix
B) Adding random noise to the matrix
C) Removing the largest singular values
D) Transposing the user matrix
9. Which of the following best describes the "truncated SVD"?
A) Using only a subset of the largest singular values and corresponding ve
ctors
B) Discarding all singular values and vectors
C) Applying SVD without normalization
D) Using SVD on a transposed matrix
[Link] can SVD be integrated with neural networks to enhance the performance of recom
mender systems?
A) By using SVD to preprocess the data before feeding it into the network
B) By replacing hidden layers with SVD components
C) By using SVD to post-process the neural network output
D) By combining the output of SVD with the input features of the network
[Link] the context of Natural Language Processing (NLP), how is SVD used for Latent Seman
tic Analysis (LSA)?
A) To reduce the dimensionality of term-document matrices and uncover hi
dden structures in the data
B) To increase the number of terms in the vocabulary
C) To directly translate documents into multiple languages
D) To generate new terms based on the existing vocabulary
12.
Ans: 7
13.
Ans: A
14.1
Ans: A
Ans: A
[Link] a typical Seq2Seq model using an encoder-decoder architecture, what does the enco
der do?
a. Generates output sequences
b. Encodes the input sequence into a fixed-length context vector
c. Translates the input sequence into another language
d. Decodes the context vector into the output sequence
[Link] of the following techniques is often used to improve the performance of Seq2Seq
models for long sequences?
a. Data augmentation
b. Attention mechanism
c. Dropout regularization
d. Batch normalization
[Link] the context of Seq2Seq models, what is the main advantage of using an attention me
chanism?
a. Reduces the number of parameters
b. Allows the model to focus on different parts of the input sequence when
generating each part of the output sequence
c. Eliminates the need for training data
d. Speeds up the training process
[Link] is a key difference between traditional Seq2Seq models with RNNs and transform
er models?
a. Transformers do not use convolutional layers
b. Transformers rely on self-attention mechanisms rather than recurrence
c. Transformers can only handle fixed-length sequences
d. Transformers are used only for image processing tasks
[Link] component in the transformer architecture is responsible for capturing dependen
cies between all elements of the input sequence simultaneously?
a. Convolutional layer
b. Recurrent layer
c. Self-attention layer
d. Pooling layer
[Link] the transformer architecture, what is the purpose of the position encoding?
a. To encode the position of tokens in the sequence since the model lacks
recurrence
b. To initialize the weights of the model
c. To normalize the input data
d. To reduce the dimensionality of the input data
[Link] transformer model has become the foundation for many state-of-the-art natural l
anguage processing tasks?
a. BERT
b. AlexNet
c. VGG
d. ResNet
[Link] application is NOT typically associated with Seq2Seq models or transformers?**
a. Language translation
b. Text summarization
c. Image classification
[Link] do transformers handle long-range dependencies in sequences more effectively th
an traditional RNNs?
a. By using convolutional layers
b. By employing a hierarchical structure
c. By applying self-attention mechanisms
d. By reducing the sequence length through pooling layers
[Link] machine translation using transformers, what role does the encoder play?
a. Encodes the target language sequence
b. Generates the translated output sequence
c. Encodes the source language sequence into contextual embeddings
d. Directly maps input sequences to output sequences without transformation
[Link] is one advantage of using the BERT model over traditional Seq2Seq models for na
tural language understanding tasks?**
a. BERT uses bidirectional context, allowing it to understand the context fr
om both directions
b. BERT requires less training data
c. BERT can process images and text simultaneously
d. BERT has fewer parameters, making it faster to train
[Link] a transformer-based text generation model, what is the purpose of using masked sel
f-attention in the decoder?
a. To prevent the model from attending to irrelevant parts of the input sequence
b. To ensure the model only attends to previously generated tokens and n
ot future tokens
c. To reduce the computational complexity of the model
d. To improve the model's ability to handle long sequences
[Link] a Seq2Seq model with attention mechanism applied to text summarization, how do
es the attention mechanism benefit the summarization task?
a. It reduces the training time required for the model
b. It allows the model to focus on the most relevant parts of the input text
when generating the summary
c. It increases the overall length of the generated summary
d. It simplifies the architecture of the model
[Link] is the transformer model adapted for the task of question answering in models like
BERT?
a. By adding a recurrent layer to handle sequences
b. By fine-tuning the pre-trained model on a question-answering dataset
c. By using transformers only in the encoder part of the model
d. By combining transformers with convolutional neural networks
[Link] is a Large Language Model (LLM)?
a. A model that primarily processes image data
b. A model with a large number of parameters designed to understand and
generate human language
c. A model used for financial forecasting
d. A model specifically for translating programming languages
[Link] of the following is an example of a Large Language Model?
a. ResNet
b. GPT-3
c. Inception
d. YOLO
[Link] does "GPT" in GPT-3 stand for?
a. General Processing Transformer
b. Graphical Pre-trained Transformer
c. Generative Pre-trained Transformer
d. General Purpose Translator
[Link] do Large Language Models typically get trained?
a. By using supervised learning with labeled data only
b. By using unsupervised learning with large text corpora
c. By using reinforcement learning exclusively
d. By training on image datasets
[Link] is a common application of Large Language Models like GPT-3?
a. Diagnosing medical images
b. Generating human-like text for chatbots
c. Recognizing objects in images
d. Predicting stock market trends
[Link] does the term "pre-training" refer to in the context of LLMs?
a. Training the model on specific, labeled datasets before fine-tuning
b. Training the model on a large, diverse dataset before fine-tuning on a s
pecific task
c. Training the model using only unsupervised methods
d. Initializing the model weights randomly before training
[Link] of the following challenges is associated with Large Language Models?
a. They can easily be run on any consumer-grade hardware
b. They always produce accurate and unbiased results
c. They require significant computational resources and data for training
d. They are easy to interpret and understand
[Link] is the primary architecture used in most Large Language Models today?
a. Recurrent Neural Networks (RNNs)
b. Convolutional Neural Networks (CNNs)
c. Transformer networks
d. Support Vector Machines (SVMs)
[Link] Large Language Model is known for its bidirectional context understanding?
a. BERT
b. GPT-2
c. ELMo
d. Transformer-XL
[Link] do LLMs like GPT-3 generate text?
a. By predicting the next word in a sequence based on the context of previ
ous words
b. By matching patterns in image data
c. By using predefined templates for text generation
d. By querying a database of pre-written texts
[Link] is the main difference between GPT-2 and GPT-3?
a. GPT-2 uses a different architecture than GPT-3
b. GPT-3 has significantly more parameters and training data compared to
GPT-2
c. GPT-2 is designed for image generation, while GPT-3 is for text generation
d. GPT-2 is a bidirectional model, while GPT-3 is unidirectional
[Link] is the ability to handle long-range dependencies important for LLMs?
a. It allows the model to process images more effectively
b. It enables the model to understand context over long paragraphs of tex
t
c. It improves the model's performance on short text sequences
d. It reduces the computational cost of training the model
[Link] does the term "context window" refer to in LLMs?
a. The maximum length of text the model can consider at once
b. The size of the training dataset
c. The number of layers in the model
d. The amount of computational resources needed for training
[Link] is the primary goal of fine-tuning a pretrained LLM?
a. To train the model from scratch on a new dataset
b. To adjust the pretrained model to perform well on a specific downstrea
m task
c. To reduce the model size for deployment
d. To convert the model to handle image data
[Link] fine-tuning a pretrained LLM, what does freezing layers mean?
a. Training all layers of the model equally
b. Only updating the weights of certain layers while keeping the others co
nstant
c. Increasing the learning rate for certain layers
d. Reducing the size of the model
[Link] of the following is a common strategy to prevent overfitting during fine-tuning?
a. Using a very high learning rate
b. Increasing the number of parameters in the model
c. Implementing techniques such as dropout and early stopping
d. Avoiding the use of validation datasets
[Link] fine-tuning an LLM for a classification task, which loss function is commonly used?
a. Mean Squared Error (MSE)
b. Cross-Entropy Loss
c. Hinge Loss
d. Mean Absolute Error (MAE)
[Link] is the purpose of using a validation set during the fine-tuning process?
a. To increase the training data size
b. To monitor the model’s performance on unseen data and prevent overfit
ting
c. To determine the final test accuracy
d. To fine-tune the hyperparameters of the model
[Link] fine-tuning, why might you choose to use a lower learning rate compared to the
pretraining phase?**
a. To prevent drastic changes in the pretrained weights
b. To speed up the training process
c. To avoid using too much computational power
d. To ensure the model forgets the pretrained knowledge
[Link] is transfer learning in the context of fine-tuning LLMs?
a. Training a new model from scratch using the same architecture
b. Transferring the knowledge from a pretrained model to a new task
c. Converting a model to handle different types of data
d. Using the model for multiple tasks simultaneously without any modification
[Link] the context of fine-tuning LLMs, what is meant by the term "domain adaptation"?
a. Adapting the model to perform in a completely different modality (e.g., from text
to image)
b. Adjusting the pretrained model to work effectively on data from a speci
fic domain (e.g., medical texts)
c. Changing the model architecture to handle different tasks
d. Modifying the input data format to match the model requirements
[Link] is the purpose of using data augmentation during the fine-tuning of LLMs?
a. To increase the diversity of the training data and prevent overfitting
b. To reduce the size of the training dataset
c. To simplify the model architecture