Machine Learning

Q1. You are part of a data science team that is working for a national fast-food chain. You create a simple report that shows trend: Customers who visit the store more often and buy smaller meals spend more than customers who visit less frequently and buy larger meals. What is the most likely diagram that your team created?

Q2. You work for an organization that sells a spam filtering service to large companies. Your organization wants to transition its product to use machine learning. It currently a list Of 250,00 keywords. If a message contains more than few of these keywords, then it is identified as spam. What would be one advantage of transitioning to machine learning?

Q3. You work for a music streaming service and want to use supervised machine learning to classify music into different genres. Your service has collected thousands of songs in each genre, and you used this as your training data. Now you pull out a small random subset of all the songs in your service. What is this subset called?

Q4. In traditional computer programming, you input commands. What do you input with machine learning?

Q5. Your company wants to predict whether existing automotive insurance customers are more likely to buy homeowners insurance. It created a model to better predict the best customers contact about homeowners insurance, and the model had a low variance but high bias. What does that say about the data model?


Q6. You want to identify global weather patterns that may have been affected by climate change. To do so, you want to use machine learning algorithms to find patterns that would otherwise be imperceptible to a human meteorologist. What is the place to start?

Q7. You work in a data science team that wants to improve the accuracy of its K-nearest neighbor result by running on top of a naive Bayes result. What is this an example of?

Q8. ____ looks at the relationship between predictors and your outcome.

Q9. What is an example of a commercial application for a machine learning system?

Q10. What does this image illustrate?

Machine Learning Q10

Q11. You work for a power company that owns hundreds of thousands of electric meters. These meters are connected to the internet and transmit energy usage data in real-time. Your supervisor asks you to direct project to use machine learning to analyze this usage data. Why are machine learning algorithms ideal in this scenario?

Q12. To predict a quantity value. use ___.

Q13. Why is naive Bayes called naive?

Q14. You work for an ice cream shop and created the chart below, which shows the relationship between the outside temperature and ice cream sales. What is the best description of this chart?

Machine Learning Q14

Q16. How do machine learning algorithms make more precise predictions?

Q17. You work for an insurance company. Which machine learning project would add the most value for the company!

Q18. What is the missing information in this diagram?

Machine Learning Q18

Q19. What is one reason not to use the same data for both your training set and your testing set?

Q20. Your university wants to use machine learning algorithms to help sort through incoming student applications. An administrator asks if the admissions decisions might be biased against any particular group, such as women. What would be the best answer?

Explanation: While machine learning algorithms don’t have bias, the data can have them.

Q21. What is stacking?

Q22. You want to create a supervised machine learning system that identifies pictures of kittens on social media. To do this, you have collected more than 100,000 images of kittens. What is this collection of images called?

Q23. You are working on a project that involves clustering together images of different dogs. You take image and identify it as your centroid image. What type machine learning algorithm are you using?

Explanation: The problem explicitly states “clustering”.

Q24. Your company wants you to build an internal email text prediction model to speed up the time that employees spend writing emails. What should you do?

Q25. Your organization allows people to create online professional profiles. A key feature is the ability to create clusters of people who are professionally connected to one another. What type of machine learning method is used to create these clusters?

Q26. What is this diagram a good example of?

Machine Learning Q26

Note: there are centres of clusters (C0, C1, C2).

Q27. Random forest is modified and improved version of which earlier technique?

Q28. Self-organizing maps are specialized neural network for which type of machine learning?

Q29. Which statement about K-means clustering is true?

Q30. You created machine learning system that interacts with its environment and responds to errors and rewards. What type of machine learning system is it?

Q31. Your data science team must build a binary classifier, and the number one criterion is the fastest possible scoring at deployment. It may even be deployed in real time. Which technique will produce a model that will likely be fastest for the deployment team use to new cases?

Q32. Your data science team wants to use the K-nearest neighbor classification algorithm. Someone on your team wants to use a K of 25. What are the challenges of this approach?

Q33. Your machine learning system is attempting to describe a hidden structure from unlabeled data. How would you describe this machine learning method?

Q34. You work for a large credit card processing company that wants to create targeted promotions for its customers. The data science team created a machine learning system that groups together customers who made similar purchases, and divides those customers based on customer loyalty. How would you describe this machine learning approach?

Q35. You are using K-nearest neighbor and you have a K of 1. What are you likely to see when you train the model?

Q36. Are data model bias and variance a challenge with unsupervised learning?

Q37. Which choice is best for binary classification?

Explanation: Logistic regression is far better than linear regression at binary classification since it biases the result toward one extreme or the other. K-means clustering can be used for classification but is not as accurate in most scenarios. Source:

Q38. With traditional programming, the programmer typically inputs commands. With machine learning, the programmer inputs

Explanation: This one is pretty straight forward and a fundamental concept. Source:

Q39. Why is it important for machine learning algorithms to have access to high-quality data?

Q40. In K-nearest neighbor, the closer you are to neighbor, the more likely you are to

Q41. In the HBO show Silicon Valley, one of the characters creates a mobile application called Not Hot Dog. It works by having the user take a photograph of food with their mobile device. Then the app says whether the food is a hot dog. To create the app, the software developer uploaded hundreds of thousands of pictures of hot dogs. How would you describe this type of machine learning?

Q42. You work for a large pharmaceutical company whose data science team wants to use unsupervised learning machine algorithms to help discover new drugs. What is an advantage to this approach?

Explanation: This one is similar to an example talked about in the Stanford Machine Learning course. Source:

Q43. In 2015, Google created a machine learning system that could beat a human in the game of Go. This extremely complex game is thought to have more gameplay possibilities than there are atoms of the universe. The first version of the system won by observing hundreds of thousands of hours of human gameplay; the second version learned how to play by getting rewards while playing against itself. How would you describe this transition to different machine learning approaches?

Q44. The security company you work for is thinking about adding machine learning algorithms to their computer network threat detection appliance. What is one advantage of using machine learning?

Q45. You work for a hospital that is tracking the community spread of a virus. The hospital created a smartwatch application that uploads body temperature data from hundreds of thousands of participants. What is the best technique to analyze the data?

Q46. Many of the advances in machine learning have come from improved ___.

Q47. What is this diagram a good example of?

Machine Learning Q45

Q48. Naive Bayes looks at each _ predictor and creates a probability that belongs in each class.


Q49. Someone on your data science team recommends that you use decision trees, naive Bayes and K-nearest neighbor, all at the same time, on the same training data, and then average the results. What is this an example of?

Q50. Your data science team wants to use machine learning to better filter out spam messages. The team has gathered a database of 100,000 messages that have been identified as spam or not spam. If you are using supervised machine learning, what would you call this data set?

Q51. You work for a website that enables customers see all images of themselves on the internet by uploading one self-photo. Your data model uses 5 characteristics to match people to their foto: color, eye, gender, eyeglasses and facial hair. Your customers have been complaining that get tens of thousands of photos without them. What is the problem?

Q52. Your supervisor asks you to create a machine learning system that will help your human resources department classify jobs applicants into well-defined groups. What type of system are you more likely to recommend?

Q53. You and your data science team have 1 TB of example data. What do you typically do with that data?

Q54. Your data science team is working on a machine learning product that can act as an artificial opponent in video games. The team is using a machine learning algorithm that focuses on rewards: If the machine does some things well, then it improves the quality of the outcome. How would you describe this type of machine learning algorithm?

Q55. The model will be trained with data in one single batch is known as ?

Q56. Which of the following is NOT supervised learning?

Q57. Suppose we would like to perform clustering on spatial data such as the geometrical locations of houses. We wish to produce clusters of many different sizes and shapes. Which of the following methods is the most appropriate?

Q58. The error function most suited for gradient descent using logistic regression is

Q59. Compared to the variance of the Maximum Likelihood Estimate (MLE), the variance of the Maximum A Posteriori (MAP) estimate is ___

Q60. ___ refers to a model that can neither model the training data nor generalize to new data.

Q61. How would you describe this type of classification challenge?

Machine Learning Q58

Q62. What does it mean to underfit your data model?

Explanation: Underfitted data models usually have high bias and low variance. Overfitted data models have low bias and high variance.

Q63. Asian user complains that your company’s facial recognition model does not properly identify their facial expressions. What should you do?

Explanation: The answer is self-explanatory: if Asian users are the only group of people making the complaint, then the training data should have more Asian faces.

Q64. You work for a website that helps match people up for lunch dates. The website boasts that it uses more than 500 predictors to find customers the perfect date, but many costumers complain that they get very few matches. What is a likely problem with your model?

Explanation: // This question is very similar to Q49 but involves a polar opposite scenario.

that answer somewhat vague and unsettled. Small number of matchings does not necessarily implies that the model overfits, especially given 500 (!) independent variables. To me, it sounds more reasonable that the threshold (matching) criterion might be too tight, thus allowing only a small number of matching to occur. So a solution can be either softening the threshold criterion or increasing the number of candidates.

Q65. (Mostly) whenever we see kernel visualizations online (or some other reference) we are actually seeing:

Q66. The activations for class A, B and C before softmax were 10,8 and 3. The different in softmax values for class A and class B would be :


Q67. The new dataset you have just scraped seems to exhibit lots of missing values. What action will help you minimizing that problem?

Q68. Which of the following methods can use either as an unsupervised learning or as a dimensionality reduction technique?

Q69. What is the main motivation for using activation functions in ANN?

Q70. Which loss function would fit best in a categorical (discrete) supervised learning ?

Q71. What is the correct option?


no. Red Blue Green
1. Validation error Training error Test error
2. Training error Test error Validation error
3. Optimal error Validation error Test error
4. Validation error Training error Optimal error

Q72. You create a decision tree to show whether someone decides to go to the beach. There are three factors in this decision: rainy, overcast, and sunny. What are these three factors called?

// these nodes decide whether the someone decides to go to beach or not, for example if its rainy people will mostly refrain from going to beach

Q73. You need to quickly label thousands of images to train a model. What should you do?

Q74. The fit line and data in the figure exhibits which pattern?


Q75. You need to select a machine learning process to run a distributed neural network on a mobile application. Which would you choose?

Q76. Which choice is the best example of labeled data?

Q77. In statistics, what is defined as the probability of a hypothesis test of finding an effect - if there is an effect to be found?

Q78. You want to create a machine learning algorithm to identify food recipes on the web. To do this, you create an algorithm that looks at different conditional probabilities. So if the post includes the word flour, it has a slightly stronger probability of being a recipe. If it contains both flour and sugar, it even more likely a recipe. What type of algorithm are you using?

Q79. What is lazy learning?

Q80. What is Q-learning reinforcement learning?

Reference Explanation:Q-learning is a model-free reinforcement learning algorithm.Q-learning is a values-based learning algorithm. Value based algorithms updates the value function based on an equation(particularly Bellman equation).

Q81. The data in your model has low bias and low variance. How would you expect the data points to be grouped together on the diagram?


Q82. Your machine learning system is using labeled examples to try to predict future data, compare that data to the predicted result, and then the model. What is the best description of this machine learning method?


Q83. In the 1983 movie WarGames, the computer learns how to master the game of chess by playing against itself. What machine learning method was the computer using?


Q84. You are working with your machine learning algorithm on something called class predictor probability. What algorithm are you most likely using?

Explanation: You could use a naïve Bayes algorithm, to differentiate three classes of dog breeds — terrier, hound, and sport dogs. Each class has three predictors — hair length, height, and weight. The algorithm does something called class predictor probability.


Q85. What is one of the most effective way to correct for underfitting your model to the data?

Q86. Your data science team is often criticized for creating reports that are boring or too obvious. What could you do to help improve the team?

Q87. What is the difference between unstructured and structured data?

Q88. You work for a startup that is trying to develop a software tool that will scan the internet for pictures of people using specific tools. The chief executive is very interested in using machine learning algorithms. What would you recommend as the best place to start?

Q89. In supervised machine learning, data scientist often have the challenge of balancing between underfitting or overfitting their data model. They often have to adjust the training set to make better predictions. What is this balance called?

Q90. What is conditional probability?

Q91. K-means clustering is what type of machine learning algorithm?

Q92. What is ensemble modeling?

Q93. What is the best definition for bias in your data model?

Q94. Which project might be best suited for supervised machine learning?

Q95. When is a decision tree most commonly used?

Q96. An organisation that owns dozens of shopping malls wants to create a machine learning product that will use facial recognition to identify customers. What is the main challenge of developing such a model?

Q97. Which of the following machine learning algorithms is unsupervised?

Explanation: During training, k-means partitions observations into k clusters. During inference, it assigns a given data point to the nearest cluster by distance. k-means is unsupervised, because it doesn’t require labeled data to be trained.

Q98. Averaging the output of multiple decision trees helps to::

Explanation: Averaging models leads to higher stability and a lower variance than individual models. Mathematically, remember that $\text{Var}(\bar{X})=\frac{\text{Var}(X)}{N}$

Q99. To optimize your objective function, you are performing full batch gradient descent using the entire training set (not stochastic gradient descent). Is it required to shuffle your training set?

Explanation: At every iteration, full batch gradient descent uses the entire training set to compute a gradient. The order in which data is processed doesn’t impact the gradient value.

At every iteration, full batch gradient descent uses the entire training set to compute a gradient. The order in which data is processed doesn’t impact the gradient value.

Q100. You’ve received 1,000,000 images and have split it in 96%/2%/2% between train, dev and test sets. You’ve trained your model, and analyzed the results. After working further on the problem, you’ve decided to correct the incorrectly labeled data on the dev set.

Which of these statements do you agree with?

Explanation: It is important that your dev and test set have the closest possible distribution to “real” data.

Q101. You’re working on a binary classification task, to classify if an image contains a cat (“1”) or doesn’t contain a cat (“0”). What loss would you choose to minimize in order to train a model?

Explanation: You are trying to minimize the binary cross entropy loss over the training set..

Q102. You want to create a machine learning algorithm that finds the top 100 people who have shared photographs of themselves on social media. What is the best machine learning method to use?

Q103. The famous data scientist Andrew Ng has been quoted as saying, “Applied machine learning is basically feature engineering.” What is feature engineering?

Q104. In the context of calculus, what is df/dx?

Q105. What is a well-designed/well-fitted model?

Q107. Fill in the blanks: Two multivariate imputer techniques are the _ imputer and the _ imputer.

Q108. You are working on a regression model using the Keras library. What method on the Model class do you use to train the model?

Q109. What is the goal of regularization in the K nearest neighbors algorithm?

Q110. If there is no trend between two variables x and y, we say that there is a _ connection between x and y.

Q111. If you are thinking about using machine learning algorithms, the best thing you can do today is to ensure you have quality _.

Explanation: “Ensuring you have good data quality prior to running machine learning algorithms is a crucial step within the overall data science and machine learning workflow.” Source

Q112. Your organization’s chief diversity officer is concerned that your engineering department lacks racial and gender diversity. You are asked to create a supervised machine learning system to help sort through hundreds of thousands of new employment applications. The human resources department insists on using internal hiring data. What are some of the dangers that you might run into?

Explanation: “If an AI is trained on a biased data set, it will naturally make biased decisions which can give calamitous results.” Source

Q113. In 2013, Google´s DeepMind project created a machine learning algorithm that could play an old-style Atari video game, Pong. The algorithm taught the machine how to play by creating a series of rewards. Each time the machine successfully returned the ball, the machine got a reward; each time the opponent missed the ball, the machine got a reward. How would you describe this type of machine learning algorithm?

Explanation: Reinforcement learning is the branch of machine learning where the algorithm interacts with the enviroment and get rewards or penalizations Source

Q114. An organization that owns dozens of shopping malls want to create a machine learning product that will use facial recognition to identify customers. What is one of the main challenges with developing such a product?

Explanation: there are many ethical questions about consent and privacy in machine learning algorithms Source