Data mining the Web [electronic resource] : uncovering patterns in Web content, structure, and usage /
Markov, Zdravko, 1956-
Data mining the Web uncovering patterns in Web content, structure, and usage / [electronic resource] : Zdravko Markov and Daniel T. Larose. - Hoboken, N.J. : Wiley-Interscience, c2007. - 1 online resource (236 p.). - Wiley series on methods and applications in data mining . - Wiley series on methods and applications in data mining. .
Description based upon print version of record. Discretizing the Numerical Variables: Binning.
Includes bibliographical references and index.
DATA MINING THE WEB; CONTENTS; PREFACE; ACKNOWLEDGMENTS; PART I WEB STRUCTURE MINING; 1 INFORMATION RETRIEVAL AND WEB SEARCH; Web Challenges; Web Search Engines; Topic Directories; Semantic Web; Crawling the Web; Web Basics; Web Crawlers; Indexing and Keyword Search; Document Representation; Implementation Considerations; Relevance Ranking; Advanced Text Search; Using the HTML Structure in Keyword Search; Evaluating Search Quality; Similarity Search; Cosine Similarity; Jaccard Similarity; Document Resemblance; References; Exercises; 2 HYPERLINK-BASED RANKING; Introduction Social Networks AnalysisPageRank; Authorities and Hubs; Link-Based Similarity Search; Enhanced Techniques for Page Ranking; References; Exercises; PART II WEB CONTENT MINING; 3 CLUSTERING; Introduction; Hierarchical Agglomerative Clustering; k-Means Clustering; Probabilty-Based Clustering; Finite Mixture Problem; Classification Problem; Clustering Problem; Collaborative Filtering (Recommender Systems); References; Exercises; 4 EVALUATING CLUSTERING; Approaches to Evaluating Clustering; Similarity-Based Criterion Functions; Probabilistic Criterion Functions MDL-Based Model and Feature EvaluationMinimum Description Length Principle; MDL-Based Model Evaluation; Feature Selection; Classes-to-Clusters Evaluation; Precision, Recall, and F-Measure; Entropy; References; Exercises; 5 CLASSIFICATION; General Setting and Evaluation Techniques; Nearest-Neighbor Algorithm; Feature Selection; Naive Bayes Algorithm; Numerical Approaches; Relational Learning; References; Exercises; PART III WEB USAGE MINING; 6 INTRODUCTION TO WEB USAGE MINING; Definition of Web Usage Mining; Cross-Industry Standard Process for Data Mining; Clickstream Analysis Web Server Log FilesRemote Host Field; Date/Time Field; HTTP Request Field; Status Code Field; Transfer Volume (Bytes) Field; Common Log Format; Identification Field; Authuser Field; Extended Common Log Format; Referrer Field; User Agent Field; Example of a Web Log Record; Microsoft IIS Log Format; Auxiliary Information; References; Exercises; 7 PREPROCESSING FOR WEB USAGE MINING; Need for Preprocessing the Data; Data Cleaning and Filtering; Page Extension Exploration and Filtering; De-Spidering the Web Log File; User Identification; Session Identification; Path Completion Directories and the Basket TransformationFurther Data Preprocessing Steps; References; Exercises; 8 EXPLORATORY DATA ANALYSIS FOR WEB USAGE MINING; Introduction; Number of Visit Actions; Session Duration; Relationship between Visit Actions and Session Duration; Average Time per Page; Duration for Individual Pages; References; Exercises; 9 MODELING FOR WEB USAGE MINING: CLUSTERING, ASSOCIATION, AND CLASSIFICATION; Introduction; Modeling Methodology; Definition of Clustering; The BIRCH Clustering Algorithm; Affinity Analysis and the A Priori Algorithm
This book introduces the reader to methods of data mining on the web, including uncovering patterns in web content (classification, clustering, language processing), structure (graphs, hubs, metrics), and usage (modeling, sequence analysis, performance).
English.
1280901039 9781280901034 9786610901036 6610901031 0470108096 9780470108093 0470108088 9780470108086
Data mining.
Web databases.
Exploration de donn�ees (Informatique)
Bases de donn�ees sur le Web.
Data mining
Web databases
QA76.9.D343 / M38 2007
005.74
Data mining the Web uncovering patterns in Web content, structure, and usage / [electronic resource] : Zdravko Markov and Daniel T. Larose. - Hoboken, N.J. : Wiley-Interscience, c2007. - 1 online resource (236 p.). - Wiley series on methods and applications in data mining . - Wiley series on methods and applications in data mining. .
Description based upon print version of record. Discretizing the Numerical Variables: Binning.
Includes bibliographical references and index.
DATA MINING THE WEB; CONTENTS; PREFACE; ACKNOWLEDGMENTS; PART I WEB STRUCTURE MINING; 1 INFORMATION RETRIEVAL AND WEB SEARCH; Web Challenges; Web Search Engines; Topic Directories; Semantic Web; Crawling the Web; Web Basics; Web Crawlers; Indexing and Keyword Search; Document Representation; Implementation Considerations; Relevance Ranking; Advanced Text Search; Using the HTML Structure in Keyword Search; Evaluating Search Quality; Similarity Search; Cosine Similarity; Jaccard Similarity; Document Resemblance; References; Exercises; 2 HYPERLINK-BASED RANKING; Introduction Social Networks AnalysisPageRank; Authorities and Hubs; Link-Based Similarity Search; Enhanced Techniques for Page Ranking; References; Exercises; PART II WEB CONTENT MINING; 3 CLUSTERING; Introduction; Hierarchical Agglomerative Clustering; k-Means Clustering; Probabilty-Based Clustering; Finite Mixture Problem; Classification Problem; Clustering Problem; Collaborative Filtering (Recommender Systems); References; Exercises; 4 EVALUATING CLUSTERING; Approaches to Evaluating Clustering; Similarity-Based Criterion Functions; Probabilistic Criterion Functions MDL-Based Model and Feature EvaluationMinimum Description Length Principle; MDL-Based Model Evaluation; Feature Selection; Classes-to-Clusters Evaluation; Precision, Recall, and F-Measure; Entropy; References; Exercises; 5 CLASSIFICATION; General Setting and Evaluation Techniques; Nearest-Neighbor Algorithm; Feature Selection; Naive Bayes Algorithm; Numerical Approaches; Relational Learning; References; Exercises; PART III WEB USAGE MINING; 6 INTRODUCTION TO WEB USAGE MINING; Definition of Web Usage Mining; Cross-Industry Standard Process for Data Mining; Clickstream Analysis Web Server Log FilesRemote Host Field; Date/Time Field; HTTP Request Field; Status Code Field; Transfer Volume (Bytes) Field; Common Log Format; Identification Field; Authuser Field; Extended Common Log Format; Referrer Field; User Agent Field; Example of a Web Log Record; Microsoft IIS Log Format; Auxiliary Information; References; Exercises; 7 PREPROCESSING FOR WEB USAGE MINING; Need for Preprocessing the Data; Data Cleaning and Filtering; Page Extension Exploration and Filtering; De-Spidering the Web Log File; User Identification; Session Identification; Path Completion Directories and the Basket TransformationFurther Data Preprocessing Steps; References; Exercises; 8 EXPLORATORY DATA ANALYSIS FOR WEB USAGE MINING; Introduction; Number of Visit Actions; Session Duration; Relationship between Visit Actions and Session Duration; Average Time per Page; Duration for Individual Pages; References; Exercises; 9 MODELING FOR WEB USAGE MINING: CLUSTERING, ASSOCIATION, AND CLASSIFICATION; Introduction; Modeling Methodology; Definition of Clustering; The BIRCH Clustering Algorithm; Affinity Analysis and the A Priori Algorithm
This book introduces the reader to methods of data mining on the web, including uncovering patterns in web content (classification, clustering, language processing), structure (graphs, hubs, metrics), and usage (modeling, sequence analysis, performance).
English.
1280901039 9781280901034 9786610901036 6610901031 0470108096 9780470108093 0470108088 9780470108086
Data mining.
Web databases.
Exploration de donn�ees (Informatique)
Bases de donn�ees sur le Web.
Data mining
Web databases
QA76.9.D343 / M38 2007
005.74