All are true. The Fp-tree Growth algorithm was proposed by a. Srikant b. Aggrawal c. Hanetal d. The main idea of the algorithm is to maintain a frequent pattern tree of the date set. An extended prefix tree structure starting crucial and quantitative information about frequent sets a.
Priori Algorithm b. Pinchers Algorithm c. FP- Tree Growth algo. All of these. The data warehousing and data mining technologies have extensive potential applications in the govt in various central govt sectors such as : a. Agriculture b. Rural Development c.
Health and Energy d. ODS Stands for a. External operational data sources b. Good performance can be achieved in a data mart environment by extensive use of a.
Indexes b. Features of Fp tree are i. It is dependent on the support threshold ii. It depends on the ordering of the items iii. It depends on the different values of trees iv. It depends on frequent itemsets with respect to give information a. Partition Algorithm executes in a. One phase b. Two-Phase c. Three phase d. In the First Phase of the Partition Algorithm a. Logically divides into a number of non-overlapping partitions b.
Logically divides into a number of overlapping Partitions c. Not divides into partitions d. Divides into non-logically and non-overlapping Partitions. Functions of the second phase of the partition algorithm are a. Actual support of item sets are generated b. Frequent itemsets are identified c. Partition algorithm is based on the a.
Size of the global Candidate set b. Size of the local Candidate set c. Size of frequent itemsets d. Of item sets. Pincer search algorithm based on the principle of a. Bottom-up b.
Top-Down c. Directional d. Pincer-Search Method Algorithm contains i Frequent item set in a bottom-up manner ii Recovery procedure to recover candidates iii List of maximal frequent itemsets iv Generate a number of partitions a. Is a full-breadth search, where no background knowledge of frequent itemsets is used for pruning? Level-crises filtering by the single item b. Level-by-level independent c. Multi-level mining with uniform support d. Multi-level mining with reduced support. Disadvantage of uniform support is a.
Items at lower levels of abstraction will occur as frequently. If the minimum support threshold is set too high, I could miss several meaningful associations c. Warehouse administrator responsible for a. Administrator b. The pincer-search has an advantage over a priori algorithm when the largest frequent itemset is long a.
What are the common approaches to tree pruning? Prepruning and Postpruning approach. None of the above. Overfitting the branches b. Overfitting the data c. Maximum Description Length b. Minimum Description Length c.
Mean Described Length d. Minimum Described Length. Heap B. Subset C. Leaf D. Inter record distance d. POS stands for a. Peer of sale b. Point of sale c. Classification and Prediction are two forms of a. Data analysis b. Decision Tree c. Classification predicts a. Categorical labels b. Prediction models continued valued function c. Each Tuple is assumed to belong to a predefined class as determined by one of the attributes, called the class label attribute.
The individual tuples making up the training set are referred to as the training data set. Classification and Regression are the two major type of data analysis. Classification is used to predict discrete or nominal values. Regression is used to predict continuous or ordered values.
Classification and Prediction have numerous applications: a. Credit approval b. Medical diagnosis c. Class label of each training sample is provided with this step is known as a. Unsupervised learning b.
Supervised learning c. Training samples d. Decision tree is based on a. Bottom-down technique b. Top-down technique c. Divide-and-conquer manner d. Top-down recursive divide-and-conquer manner.
Recursive Partitioning stops in Decision Tree when a. All samples for a given node belong to the same class. There are no remaining attributes on which samples may be further partitioned. There are no samples for the branch test. All the above. To select the test attribute of each node in a decision tree we use a. Entity Selection Measure b. Data Selection Measure c. Information Gain Measure d. Test attribute for the current node in the decision tree is chosen on the basis of a.
Lowest entity gain b. Highest data gain c. Highest Information Gain d. Lowest Attribute Gain. Advantage of the Information-theoretic approach of the decision tree is a. Minimizes the expected number of tests needed b. Minimizes the number of Nodes c.
Maximizes the number of nodes d. Maximizes the number of tests. Let us be the no. L s1,s2,…….. Steps applied to the data in order to improve the accuracy, efficiency, and scalability are:- a. Data cleaning b. Relevance analysis c. Data transformation d. All of the above. The process used to remove or reduce noise and the treatment of missing values a. Relevance analysis may be performed on the data by removing any irrelevant attribute from the process. Classification and prediction method can be affected by:- a.
Interpretability d. In a decision tree internal node denotes a test on an attribute and Leaf nodes represent classes or class distributions a. Clara b. Dara c.
Pam d. Clustering Large Applicant b. Close Large Applicant c. Clustering Large Applications d. CLARA b. Both a and b d. Which are the two type of Hierarchical Clustering? Cluster is a : a. The process of grouping a set of physical or abstract objects into classes of similar objects is called clustering.
A cluster of data objects can be treated collectively as one group in many applications c. Cluster analysis is an important human activity. Cluster analysis tools based on a.
K-means b. Retailers keep on collecting information about seasonal product sales, transactional data, and demographics, etc. Data mining is proved to be one of the most important tools to identify useful information from the large pool of information collected over time. It is also used to improve revenue generation and reduce the costs of business.
Factor analysis: Factor analysis is a technique used to reduce a large number of variables into a few factors. This technique extracts maximum common variance from all variables and puts them into a common score.
As an index of all variables, that score can be used for further analysis. Data scraping: Data scraping is a technique in which a computer program extracts data from human-readable output coming from another program. It often ignores binary data, display formatting, redundant labels, and other information that is either irrelevant or hinders automated processing. It is generally considered as ad-hoc, inelegant techniques used as a last resort when no other mechanism for data interchange is available.
Data cloning: Data cloning refers to a complete and separate copy of a database system that includes the business data, the DBMS software, and any other application that makes up the environment. Data cloning involves both fully functional and separate in its own right. The cloned data can be modified at its inception due to configuration changes or data subsetting.
Regression analysis: Regression analysis is a reliable method of identifying which variables have an impact on a topic of interest. The process of performing a regression allows you to confidently determine which factors matter most, which factors can be ignored, and how these factors influence each other. Single-user database application Multiuser database application E-commerce database application Data mining database application.
Enterprise Resource Planning ERP : Type of software that organizations use to manage day to day business activities such as accounting, project management and supply chain operations. It is an example of multiuser database application Hence option 2 is the correct answer. Processing Scalability Replication All of the options. The correct answer is option 3.
So NoSQL databases support automatic replication means that you get high availability and disaster recovery. This data is modelled in means other than the tabular relations used in relational databases.
The main advantages are high scalability and high availability. Buffer Virtual memory Staging area inter-storage area. Concept: Data warehouse is the process of collecting aand managing the data from different sources and use them for a business purpose.
Explanation: Data warehouse usually contains historical data concluded from transactional data. Identify available migration tools Test before migrating Improve the data Find the data. The correct answer is to Find the data.
Key Points Data Migration: Data migration is one of the recent developments in technology. Data migration can be of the following types: Storage Migration- It is a process where the data is moved from the existing system to a new system by providing access to the data.
It uses the cloning method of data migration. Cloud Migration- It is a process of moving the data from a data cloud center or from a data premises to another cloud storage system.
Application Migration- It is a process of moving the data from one environment to another environment. It includes movement from the centre of the premises to a cloud or from one cloud to another. The correct answer is "option 3". Data-base Data warehouse Data mining M. Data warehouse: A Data warehouse, also known as an enterprise data warehouse is the electronic storage of a large amount of information by a business. It is a process of collecting and managing data from various internal and external sources compiled by the firm to provide meaningful insights.
It is a system used for reporting and data analysis and is considered a core aspect of business intelligence. A data warehouse is solely intended to perform queries and analysis and contains a large amount of historical data. Data-base: A database is a collection of structured data, information that is stored in a computer system in an electronic format. The data and DBMS together with the association of application along with them, are referred to as database systems. Data mining: Data mining is a process of extracting useful data from a large set of raw data.
It is usually applied to credit ratings and to intelligent anti-fraud systems to analyze transactions, card transactions, purchasing patterns, and other customer financial data. Management Information System is a study of people, organization, technology, and the relationship among all of them.
This helps organizations to realize maximum benefit from investment in personnel, equipment, and business processes. The Facebook API is a platform for building application that are available to the members of the social network of Facebook.
The API allows applications to use the social connections and profile information to make application more involving. Dedicated parity Double parity Hamming code parity Distributed parity. Exam Preparation Simplified Learn, practice, analyse and improve.
0コメント