
DSA-C02 PDF Exam Material 2023 Realistic DSA-C02 Dumps Questions
Updated Snowflake DSA-C02 Dumps – PDF & Online Engine
NEW QUESTION # 32
Mark the Incorrect understanding of Data Scientist about Streams?
- A. Streams on views support both local views and views shared using Snowflake Secure Data Sharing, including secure views.
- B. Streams itself does not contain any table data.
- C. Streams can track changes in materialized views.
- D. Streams do not support repeatable read isolation.
Answer: C,D
Explanation:
Explanation
Streams on views support both local views and views shared using Snowflake Secure Data Sharing, including secure views. Currently, streams cannot track changes in materialized views.
stream itself does not contain any table data. A stream only stores an offset for the source object and returns CDC records by leveraging the versioning history for the source object. When the first stream for a table is created, several hidden columns are added to the source table and begin storing change tracking metadata.
These columns consume a small amount of storage. The CDC records returned when querying a stream rely on a combination of the offset stored in the stream and the change tracking metadata stored in the table. Note that for streams on views, change tracking must be enabled explicitly for the view and underlying tables to add the hidden columns to these tables.
Streams support repeatable read isolation. In repeatable read mode, multiple SQL statements within a transaction see the same set of records in a stream. This differs from the read committed mode supported for tables, in which statements see any changes made by previous statements executed within the same transaction, even though those changes are not yet committed.
The delta records returned by streams in a transaction is the range from the current position of the stream until the transaction start time. The stream position advances to the transaction start time if the transaction commits; otherwise it stays at the same position.
NEW QUESTION # 33
Mark the incorrect statement regarding usage of Snowflake Stream & Tasks?
- A. Snowflake ensures only one instance of a task with a schedule (i.e. a standalone task or the root task in a DAG) is executed at a given time. If a task is still running when the next scheduled execution time occurs, then that scheduled time is skipped.
- B. Snowflake automatically resizes and scales the compute resources for serverless tasks.
- C. Streams support repeatable read isolation.
- D. An standard-only stream tracks row inserts only.
Answer: D
Explanation:
Explanation
All are correct except a standard-only stream tracks row inserts only.
A standard (i.e. delta) stream tracks all DML changes to the source object, including inserts, up-dates, and deletes (including table truncates).
NEW QUESTION # 34
Mark the incorrect statement regarding Python UDF?
- A. A scalar function (UDF) returns a tabular value for each input row
- B. A UDF also gives you a way to encapsulate functionality so that you can call it repeatedly from multiple places in code
- C. Python UDFs can contain both new code and calls to existing packages
- D. For each row passed to a UDF, the UDF returns either a scalar (i.e. single) value or, if defined as a table function, a set of rows.
Answer: A
Explanation:
Explanation
A scalar function (UDF) returns one output row for each input row. The returned row consists of a single column/value
NEW QUESTION # 35
Select the Data Science Tools which are known to provide native connectivity to Snowflake?
- A. DiYotta
- B. DvSUM
- C. Denodo
- D. HEX
Answer: D
Explanation:
Explanation
Hex - collaborative data science and analytics platform
Denodo - data virtualization and federation platform
DvSum - data catalog and data intelligence platform
Diyotta - data integration and migration
NEW QUESTION # 36
Which Python method can be used to Remove duplicates by Data scientist?
- A. duplicates()
- B. clean_duplicates()
- C. remove_duplicates()
- D. drop_duplicates()
Answer: B
Explanation:
Explanation
The drop_duplicates() method removes duplicate rows.
dataframe.drop_duplicates(subset, keep, inplace, ignore_index)
Remove duplicate rows from the DataFrame:
1.import pandas as pd
2.data = {
3."name": ["Peter", "Mary", "John", "Mary"],
4."age": [50, 40, 30, 40],
5."qualified": [True, False, False, False]
6.}
7.
8.df = pd.DataFrame(data)
9.newdf = df.drop_duplicates()
NEW QUESTION # 37
Which of the following method is used for multiclass classification?
- A. loocv
- B. one vs another
- C. all vs one
- D. one vs rest
Answer: D
Explanation:
Explanation
Binary vs. Multi-Class Classification
Classification problems are common in machine learning. In most cases, developers prefer using a supervised machine-learning approach to predict class tables for a given dataset. Unlike regression, classification involves designing the classifier model and training it to input and categorize the test dataset. For that, you can divide the dataset into either binary or multi-class modules.
As the name suggests, binary classification involves solving a problem with only two class labels. This makes it easy to filter the data, apply classification algorithms, and train the model to predict outcomes. On the other hand, multi-class classification is applicable when there are more than two class labels in the input train data.
The technique enables developers to categorize the test data into multiple binary class labels.
That said, while binary classification requires only one classifier model, the one used in the multi-class approach depends on the classification technique. Below are the two models of the multi-class classification algorithm.
One-Vs-Rest Classification Model for Multi-Class Classification
Also known as one-vs-all, the one-vs-rest model is a defined heuristic method that leverages a binary classification algorithm for multi-class classifications. The technique involves splitting a multi-class dataset into multiple sets of binary problems. Following this, a binary classifier is trained to handle each binary classification model with the most confident one making predictions.
For instance, with a multi-class classification problem with red, green, and blue datasets, binary classification can be categorized as follows:
Problem one: red vs. green/blue
Problem two: blue vs. green/red
Problem three: green vs. blue/red
The only challenge of using this model is that you should create a model for every class. The three classes require three models from the above datasets, which can be challenging for large sets of data with million rows, slow models, such as neural networks and datasets with a significant number of classes.
The one-vs-rest approach requires individual models to prognosticate the probability-like score. The class index with the largest score is then used to predict a class. As such, it is commonly used forclassification algorithms that can naturally predict scores or numerical class membership such as perceptron and logistic regression.
NEW QUESTION # 38
Consider a data frame df with 10 rows and index [ 'r1', 'r2', 'r3', 'row4', 'row5', 'row6', 'r7', 'r8', 'r9', 'row10'].
What does the aggregate method shown in below code do?
g = df.groupby(df.index.str.len())
g.aggregate({'A':len, 'B':np.sum})
- A. Computes length of column A
- B. Computes Sum of column A values
- C. Computes length of column A and Sum of Column B values of each group
- D. Computes length of column A and Sum of Column B values
Answer: C
Explanation:
Explanation
Computes length of column A and Sum of Column B values of each group
NEW QUESTION # 39
Which command is used to install Jupyter Notebook?
- A. pip install jupyter
- B. pip install nbconvert
- C. pip install jupyter-notebook
- D. pip install notebook
Answer: A
Explanation:
Explanation
Jupyter Notebook is a web-based interactive computational environment.
The command used to install Jupyter Notebook is pip install jupyter.
The command used to start Jupyter Notebook is jupyter notebook.
NEW QUESTION # 40
Which one is not Types of Feature Scaling?
- A. Standard Scaling
- B. Min-Max Scaling
- C. Robust Scaling
- D. Economy Scaling
Answer: B
Explanation:
ExplanationFeature Scaling
Feature Scaling is the process of transforming the features so that they have a similar scale. This is important in machine learning because the scale of the features can affect the performance of the model.
Types of Feature Scaling:
Min-Max Scaling: Rescaling the features to a specific range, such as between 0 and 1, by subtracting the minimum value and dividing by the range.
Standard Scaling: Rescaling the features to have a mean of 0 and a standard deviation of 1 by subtracting the mean and dividing by the standard deviation.
Robust Scaling: Rescaling the features to be robust to outliers by dividing them by the interquartile range.
Benefits of Feature Scaling:
Improves Model Performance: By transforming the features to have a similar scale, the model can learn from all features equally and avoid being dominated by a few large features.
Increases Model Robustness: By transforming the features to be robust to outliers, the model can become more robust to anomalies.
Improves Computational Efficiency: Many machine learning algorithms, such as k-nearest neighbors, are sensitive to the scale of the features and perform better with scaled features.
Improves Model Interpretability: By transforming the features to have a similar scale, it can be easier to understand the model's predictions.
NEW QUESTION # 41
Which ones are the correct rules while using a data science model created via External function in Snowflake?
- A. External functions can accept Model parameters.
- B. External functions return a value. The returned value can be a compound value, such as a VARIANT that contains JSON.
- C. An external function can appear in any clause of a SQL statement in which other types of UDF can appear.
- D. External functions can be overloaded.
Answer: A,B,C,D
Explanation:
Explanation
From the perspective of a user running a SQL statement, an external function behaves like any other UDF .
External functions follow these rules:
External functions return a value.
External functions can accept parameters.
An external function can appear in any clause of a SQL statement in which other types of UDF can appear. For example:
1.select my_external_function_2(column_1, column_2)
2.from table_1;
1.select col1
2.from table_1
3.where my_external_function_3(col2) < 0;
1.create view view1 (col1) as
2.select my_external_function_5(col1)
3.from table9;
An external function can be part of a more complex expression:
1.select upper(zipcode_to_city_external_function(zipcode))
2.from address_table;
The returned value can be a compound value, such as a VARIANT that contains JSON.
External functions can be overloaded; two different functions can have the same name but different signatures (different numbers or data types of input parameters).
NEW QUESTION # 42
A Data Scientist as data providers require to allow consumers to access all databases and database objects in a share by granting a single privilege on shared databases. Which one is incorrect SnowSQL command used by her while doing this task?
Assuming:
A database named product_db exists with a schema named product_agg and a table named Item_agg.
The database, schema, and table will be shared with two accounts named xy12345 and yz23456.
1.USE ROLE accountadmin;
2.CREATE DIRECT SHARE product_s;
3.GRANT USAGE ON DATABASE product_db TO SHARE product_s;
4.GRANT USAGE ON SCHEMA product_db. product_agg TO SHARE product_s;
5.GRANT SELECT ON TABLE sales_db. product_agg.Item_agg TO SHARE product_s;
6.SHOW GRANTS TO SHARE product_s;
7.ALTER SHARE product_s ADD ACCOUNTS=xy12345, yz23456;
8.SHOW GRANTS OF SHARE product_s;
- A. GRANT USAGE ON DATABASE product_db TO SHARE product_s;
- B. ALTER SHARE product_s ADD ACCOUNTS=xy12345, yz23456;
- C. GRANT SELECT ON TABLE sales_db. product_agg.Item_agg TO SHARE product_s;
- D. CREATE DIRECT SHARE product_s;
Answer: C
Explanation:
Explanation
CREATE SHARE product_s is the correct Snowsql command to create Share object.
Rest are correct ones.
https://docs.snowflake.com/en/user-guide/data-sharing-provider#creating-a-share-using-sql
NEW QUESTION # 43
Which ones are the known limitations of using External function?
- A. Currently, external functions must be scalar functions. A scalar external function re-turns a single value for each input row.
- B. External functions have more overhead than internal functions (both built-in functions and internal UDFs) and usually execute more slowly
- C. Currently, external functions cannot be shared with data consumers via Secure Data Sharing.
- D. An external function accessed through an AWS API Gateway private endpoint can be accessed only from a Snowflake VPC (Virtual Private Cloud) on AWS and in the same AWS region.
Answer: A,B,C,D
NEW QUESTION # 44
All aggregate functions except _____ ignore null values in their input collection
- A. Count(*)
- B. Avg
- C. Count(attribute)
- D. Sum
Answer: A
Explanation:
Explanation
Count(*)
* is used to select all values including null.
NEW QUESTION # 45
Which of the following Functions do Support Windowing?
- A. EXTRACT
- B. ENCRYPT
- C. LISTAGG
- D. HASH_AGG
Answer: C
Explanation:
Explanation
What is a Window?
A window is a group of related rows. For example, a window might be defined based on timestamps, with all rows in the same month grouped in the same window. Or a window might be defined based on location, with all rows from a particular city grouped in the same window.
A window can consist of zero, one, or multiple rows. For simplicity, Snowflake documentation usually says that a window contains multiple rows.
What is a Window Function?
A window function is any function that operates over a window of rows.
A window function is generally passed two parameters:
A row. More precisely, a window function is passed 0 or more expressions. In almost all cases, at least one of those expressions references a column in that row. (Most window functions require at least one column or expression, but a few window functions, such as some rank-related functions, do not required an explicit column or expression.) A window of related rows that includes that row. The window can be the entire table, or a subset of the rows in the table.
For non-window functions, all arguments are usually passed explicitly to the function, for example:
MY_FUNCTION(argument1, argument2, ...)
Window functions behave differently; although the current row is passed as an argument the normal way, the window is passed through a separate clause, called an OVER clause. The syntax of the OVER clause is documented later.
LISTAGG
Returns the concatenated input values, separated by the delimiter string.
Window function
1.LISTAGG( [ DISTINCT ] <expr1> [, <delimiter> ] )
2.[ WITHIN GROUP ( <orderby_clause> ) ]
3.OVER ( [ PARTITION BY <expr2> ] )
HASH_AGG
Returns an aggregate signed 64-bit hash value over the (unordered) set of input rows. HASH_AGG never returns NULL, even if no input is provided. Empty input "hashes" to 0.
Window function
HASH_AGG( [ DISTINCT ] <expr> [ , <expr2> ... ] ) OVER ( [ PARTITION BY <expr3> ] ) HASH_AGG(*) OVER ( [ PARTITION BY <expr3> ] )
NEW QUESTION # 46
Mark the Incorrect statements regarding MIN / MAX Functions?
- A. NULL values are skipped unless all the records are NULL
- B. For compatibility with other systems, the DISTINCT keyword can be specified as an argument for MIN or MAX, but it does not have any effect
- C. NULL values are ignored unless all the records are NULL, in which case a NULL value is returned
- D. The data type of the returned value is the same as the data type of the input values
Answer: C
Explanation:
Explanation
NULL values are ignored unless all the records are NULL, in which case a NULL value is returned
NEW QUESTION # 47
Which tools helps data scientist to manage ML lifecycle & Model versioning?
- A. CRUX
- B. Pachyderm
- C. MLFlow
- D. Albert
Answer: B,C
Explanation:
Explanation
Model versioning in a way involves tracking the changes made toan ML model that has been previously built.
Put differently, it is the process of making changes to the configurations of an ML Model. From another perspective, we can see model versioning as a feature that helps Machine Learning Engineers, Data Scientists, and related personnel create and keep multiple versions of the same model.
Think of it as a way of taking notes of the changes you make to the model through tweaking hyperparameters, retraining the model with more data, and so on.
In model versioning, a number of things need to be versioned, to help us keep track of important changes. I'll list and explain them below:
Implementation code: From the early days of model building to optimization stages, code or in this case source code of the model plays an important role. This code experiences significant changes during optimization stages which can easily be lost if not tracked properly. Because of this, code is one of the things that are taken into consideration during the model versioning process.
Data: In some cases, training data does improve significantly from its initial state during model op-timization phases. This can be as a result of engineering new features from existing ones to train our model on. Also there is metadata (data about your training data and model) to consider versioning. Metadata can change different times over without the training data actually changing. We need to be able to track these changes through versioning Model: The model is a product of the two previous entities and as stated in their explanations, an ML model changes at different points of the optimization phases through hyperparameter setting, model artifacts and learning coefficients. Versioning helps take record of the different versions of a Machine Learning model.
MLFlow & Pachyderm are the tools used to manage ML lifecycle & Model versioning.
NEW QUESTION # 48
Which ones are the key actions in the data collection phase of Machine learning included?
- A. Label
- B. Measure
- C. Ingest and Aggregate
- D. Probability
Answer: A,C
Explanation:
Explanation
The key actions in the data collection phase include:
Label: Labeled data is the raw data that was processed by adding one or more meaningful tags so that a model can learn from it. It will take some work to label it if such information is missing (manually or automatically).
Ingest and Aggregate: Incorporating and combining data from many data sources is part of data collection in AI.
Data collection
Collecting data for training the ML model is the basic step in the machine learning pipeline. The predictions made by ML systems can only be as good as the data on which they have been trained. Following are some of the problems that can arise in data collection:
Inaccurate data. The collected data could be unrelated to the problem statement.
Missing data. Sub-data could be missing. That could take the form of empty values in columns or missing images for some class of prediction.
Data imbalance. Some classes or categories in the data may have a disproportionately high or low number of corresponding samples. As a result, they risk being under-represented in the model.
Data bias. Depending on how the data, subjects and labels themselves are chosen, the model could propagate inherent biases on gender, politics, age or region, for example. Data bias is difficult to detect and remove.
Several techniques can be applied to address those problems:
Pre-cleaned, freely available datasets. If the problem statement (for example, image classification, object recognition) aligns with a clean, pre-existing, properly formulated dataset, then take ad-vantage of existing, open-source expertise.
Web crawling and scraping. Automated tools, bots and headless browsers can crawl and scrape websites for data.
Private data. ML engineers can create their own data. This is helpful when the amount of data required to train the model is small and the problem statement is too specific to generalize over an open-source dataset.
Custom data. Agencies can create or crowdsource the data for a fee.
NEW QUESTION # 49
Which of the following is a common evaluation metric for binary classification?
- A. Mean squared error (MSE)
- B. F1 score
- C. Accuracy
- D. Area under the ROC curve (AUC)
Answer: D
Explanation:
Explanation
The area under the ROC curve (AUC) is a common evaluation metric for binary classification, which measures the performance of a classifier at different threshold values for the predicted probabilities. Other common metrics include accuracy, precision, recall, and F1 score, which are based on the confusion matrix of true positives, false positives, true negatives, and false negatives.
NEW QUESTION # 50
Which is the visual depiction of data through the use of graphs, plots, and informational graphics?
- A. Data Virtualization
- B. Data Mining
- C. Data visualization
- D. Data Interpretation
Answer: B
Explanation:
Explanation
Data visualization is the visual depiction of data through the use of graphs, plots, and informational graphics.
Its practitioners use statistics and data science to conveythe meaning behind data in ethical and accurate ways.
NEW QUESTION # 51
Which type of Python UDFs let you define Python functions that receive batches of input rows as Pandas DataFrames and return batches of results as Pandas arrays or Series?
- A. MPP Python UDFs
- B. Vectorized Python UDFs
- C. Scaler Python UDFs
- D. Hybrid Python UDFs
Answer: B
Explanation:
Explanation
Vectorized Python UDFs let you define Python functions that receive batches of input rows as Pandas DataFrames and return batches of results as Pandas arrays or Series. You call vectorized Py-thon UDFs the same way you call other Python UDFs.
Advantages of using vectorized Python UDFs compared to the default row-by-row processing pat-tern include:
The potential for better performance if your Python code operates efficiently on batches of rows.
Less transformation logic required if you are calling into libraries that operate on Pandas Data-Frames or Pandas arrays.
When you use vectorized Python UDFs:
You do not need to change how you write queries using Python UDFs. All batching is handled by the UDF framework rather than your own code.
As with non-vectorized UDFs, there is no guarantee of which instances of your handler code will see which batches of input.
NEW QUESTION # 52
As Data Scientist looking out to use Reader account, Which ones are the correct considerations about Reader Accounts for Third-Party Access?
- A. Reader accounts (formerly known as "read-only accounts") provide a quick, easy, and cost-effective way to share data without requiring the consumer to become a Snowflake customer.
- B. Each reader account belongs to the provider account that created it.
- C. Users in a reader account can query data that has been shared with the reader account, but cannot perform any of the DML tasks that are allowed in a full account, such as data loading, insert, update, and similar data manipulation operations.
- D. Data sharing is only possible between Snowflake accounts.
Answer: D
Explanation:
Explanation
Data sharing is only supported between Snowflake accounts. As a data provider, you might want to share data with a consumer who does not already have a Snowflake account or is not ready to be-come a licensed Snowflake customer.
To facilitate sharing data with these consumers, you can create reader accounts. Reader accounts (formerly known as "read-only accounts") provide a quick, easy, and cost-effective way to share data without requiring the consumer to become a Snowflake customer.
Each reader account belongs to the provider account that created it. As a provider, you use shares to share databases with reader accounts; however, a reader account can only consume data from the provider account that created it.
So, Data Sharing is possible between Snowflake & Non-snowflake accounts via Reader Account.
NEW QUESTION # 53
In a simple linear regression model (One independent variable), If we change the input variable by 1 unit. How much output variable will change?
- A. no change
- B. by its slope
- C. by intercept
- D. by 1
Answer: B
Explanation:
Explanation
What is linear regression?
Linear regression analysis is used to predict the value of a variable based on the value of another variable. The variable you want to predict is called the dependent variable. The variable you are using to predict the other variable's value is called the independent variable.
Linear regression attempts to model the relationship between two variables by fitting a linear equation to observed data. One variable is considered to be an explanatoryvariable, and the other is considered to be a dependent variable. For example, a modeler might want to relate the weights of individuals to their heights using a linear regression model.
A linear regression line has an equation of the form Y = a + bX, where X is the explanatory variable and Y is the dependent variable. The slope of the line is b, and a is the intercept (the value of y when x = 0).
For linear regression Y=a+bx+error.
If neglect error then Y=a+bx. If x increases by 1, then Y = a+b(x+1) which implies Y=a+bx+b. So Y increases by its slope.
For linear regression Y=a+bx+error. If neglect error then Y=a+bx. If x increases by 1, then Y = a+b(x+1) which implies Y=a+bx+b. So Y increases by its slope.
NEW QUESTION # 54
Consider a data frame df with columns ['A', 'B', 'C', 'D'] and rows ['r1', 'r2', 'r3']. What does the ex-pression df[lambda x : x.index.str.endswith('3')] do?
- A. Returns the row name r3
- B. Results in Error
- C. Returns the third column
- D. Filters the row labelled r3
Answer: D
Explanation:
Explanation
It will Filters the row labelled r3.
NEW QUESTION # 55
......
Snowflake DSA-C02 Dumps PDF Are going to be The Best Score: https://www.lead2passed.com/Snowflake/DSA-C02-practice-exam-dumps.html