数据科学入门(影印版)
Sam Lau, Joseph Gonzalez, Deborah Nolan
出版时间:2024年03月
页数:594
“我真希望在第一次用‘数据科学家’这个词来描述我们的工作时能有这本书。如果你想从事数据科学/工程、AI或机器学习,这本书就是你的起点。”
——DJ Patil博士
美国第一位首席数据科学家

作为一名有抱负的数据科学家,你理解为什么组织机构的重要决策都依赖于数据 —— 无论是设计网站的公司、决定如何改善服务的城市,还是致力于阻止疾病传播的科学家。你需要具备将一堆杂乱的数据提炼成可操作的洞见所需的技能。我们称之为数据科学生命周期:收集、整理、分析数据并从中得出结论的过程。
本书是第一本兼顾编程和统计学基础技能的书籍,涵盖了整个数据科学生命周期。它面向那些希望成为数据科学家或与数据科学家合作的读者,以及希望跨越“技术/非技术”界限的数据分析师。如果具备基本的Python编程知识,你将学到如何使用像pandas这样的行业标准工具来处理数据。
● 将感兴趣的问题提炼为可通过数据研究的问题
● 进行数据收集可能涉及的文本处理、Web抓取等技术
● 通过数据清洗、探索和可视化获得有价值的洞见
● 学习如何使用建模来描述数据
● 将研究结果推广到数据之外
  1. Preface
  2. Part I. The Data Science Lifecycle
  3. 1. The Data Science Lifecycle
  4. The Stages of the Lifecycle
  5. Examples of the Lifecycle
  6. Summary
  7. 2. Questions and Data Scope
  8. Big Data and New Opportunities
  9. Target Population, Access Frame, and Sample
  10. Instruments and Protocols
  11. Measuring Natural Phenomena
  12. Accuracy
  13. Summary
  14. 3. Simulation and Data Design
  15. The Urn Model
  16. Example: Simulating Election Poll Bias and Variance
  17. Example: Simulating a Randomized Trial for a Vaccine
  18. Example: Measuring Air Quality
  19. Summary
  20. 4. Modeling with Summary Statistics
  21. The Constant Model
  22. Minimizing Loss
  23. Summary
  24. 5. Case Study: Why Is My Bus Always Late?
  25. Question and Scope
  26. Data Wrangling
  27. Exploring Bus Times
  28. Modeling Wait Times
  29. Summary
  30. Part II. Rectangular Data
  31. 6. Working with Dataframes Using pandas
  32. Subsetting
  33. Aggregating
  34. Joining
  35. Transforming
  36. How Are Dataframes Different from Other Data Representations?
  37. Summary
  38. 7. Working with Relations Using SQL
  39. Subsetting
  40. Aggregating
  41. Joining
  42. Transforming and Common Table Expressions
  43. Summary
  44. Part III. Understanding The Data
  45. 8. Wrangling Files
  46. Data Source Examples
  47. File Formats
  48. File Encoding
  49. File Size
  50. The Shell and Command-Line Tools
  51. Table Shape and Granularity
  52. Summary
  53. 9. Wrangling Dataframes
  54. Example: Wrangling CO2 Measurements from the Mauna Loa Observatory
  55. Quality Checks
  56. Missing Values and Records
  57. Transformations and Timestamps
  58. Modifying Structure
  59. Example: Wrangling Restaurant Safety Violations
  60. Summary
  61. 10. Exploratory Data Analysis
  62. Feature Types
  63. What to Look For in a Distribution
  64. What to Look For in a Relationship
  65. Comparisons in Multivariate Settings
  66. Guidelines for Exploration
  67. Example: Sale Prices for Houses
  68. Summary
  69. 11. Data Visualization
  70. Choosing Scale to Reveal Structure
  71. Smoothing and Aggregating Data
  72. Facilitating Meaningful Comparisons
  73. Incorporating the Data Design
  74. Adding Context
  75. Creating Plots Using plotly
  76. Other Tools for Visualization
  77. Summary
  78. 12. Case Study: How Accurate Are Air Quality Measurements?
  79. Question, Design, and Scope
  80. Finding Collocated Sensors
  81. Wrangling and Cleaning AQS Sensor Data
  82. Wrangling PurpleAir Sensor Data
  83. Exploring PurpleAir and AQS Measurements
  84. Creating a Model to Correct PurpleAir Measurements
  85. Summary
  86. Part IV. Other Data Sources
  87. 13. Working with Text
  88. Examples of Text and Tasks
  89. String Manipulation
  90. Regular Expressions
  91. Text Analysis
  92. Summary
  93. 14. Data Exchange
  94. NetCDF Data
  95. JSON Data
  96. HTTP
  97. REST
  98. XML, HTML, and XPath
  99. Summary
  100. Part V. Linear Modeling
  101. 15. Linear Models
  102. Simple Linear Model
  103. Example: A Simple Linear Model for Air Quality
  104. Fitting the Simple Linear Model
  105. Multiple Linear Model
  106. Fitting the Multiple Linear Model
  107. Example: Where Is the Land of Opportunity?
  108. Feature Engineering for Numeric Measurements
  109. Feature Engineering for Categorical Measurements
  110. Summary
  111. 16. Model Selection
  112. Overfitting
  113. Train-Test Split
  114. Cross-Validation
  115. Regularization
  116. Model Bias and Variance
  117. Summary
  118. 17. Theory for Inference and Prediction
  119. Distributions: Population, Empirical, Sampling
  120. Basics of Hypothesis Testing
  121. Bootstrapping for Inference
  122. Basics of Confidence Intervals
  123. Basics of Prediction Intervals
  124. Probability for Inference and Prediction
  125. Summary
  126. 18. Case Study: How to Weigh a Donkey
  127. Donkey Study Question and Scope
  128. Wrangling and Transforming
  129. Exploring
  130. Modeling a Donkey’s Weight
  131. Summary
  132. Part VI. Classification
  133. 19. Classification
  134. Example: Wind-Damaged Trees
  135. Modeling and Classification
  136. Modeling Proportions (and Probabilities)
  137. A Loss Function for the Logistic Model
  138. From Probabilities to Classification
  139. Summary
  140. 20. Numerical Optimization
  141. Gradient Descent Basics
  142. Minimizing Huber Loss
  143. Convex and Differentiable Loss Functions
  144. Variants of Gradient Descent
  145. Summary
  146. 21. Case Study: Detecting Fake News
  147. Question and Scope
  148. Obtaining and Wrangling the Data
  149. Exploring the Data
  150. Modeling
  151. Summary
  152. Additional Material
  153. Data Sources
  154. Index
书名:数据科学入门(影印版)
国内出版社:东南大学出版社
出版时间:2024年03月
页数:594
书号:978-1098113001
原版书书名:Learning Data Science
原版书出版商:O'Reilly Media
Sam Lau
 
Sam Lau是加州大学圣地亚哥分校Halicioglu数据科学研究所的助理教学教授。Sam拥有十年的教学经验,并曾在加州大学伯克利分校和加州大学圣地亚哥分校设计并教授一流的数据科学课程。
 
 
Joseph Gonzalez
 
Joey Gonzalez是加州大学伯克利分校电子工程与计算机科学系副教授,是伯克利人工智能研究组成员,也是伯克利RISE实验室创始成员。他还共同创立了Turi Inc.和Aqueduct,为数据科学家开发各种工具。
 
 
Deborah Nolan
 
Deborah Nolan是加州大学伯克利分校计算机、数据科学和社会学院的统计学名誉教授兼学生事务副院长。
 
 
The animal on the cover of Learning Data Science is an edible dormouse (Glis glis). As you might suspect, these creatures have wound up in human cuisine. The edible dormouse was served grilled as a delicacy in ancient Rome and is still consumed today in Croatia and Slovenia. Edible dormice have squirrel-like bodies with small ears, short legs, large feet, and long, bushy tails. Their front feet have four digits and their hind feet have five. They are predominantly covered in gray to gray-brown fur with white underbellies. Their feet have naked soles that secrete a sticky substance that enables climbing.
These nocturnal creatures spend most of their time in trees. They can be found across Europe and in parts of western and central Asia. While the IUCN categorizes edible dormice as a species of Least Concern, they are threatened by illegal hunting and habitat loss. Many of the animals on O’Reilly covers are endangered; all of them are important to the world. The cover illustration is by Karen Montgomery, based on an antique line engraving from Lydekker’s Royal Natural History.
购买选项
定价:169.00元
书号:978-1098113001
出版社:东南大学出版社