机器学习数据训练(影印版)
Anthony Sarkis
出版时间:2024年03月
页数:329
“本书360度地全面介绍了如何生成高质量的训练数据并启动新项目。”
——Anirudh Koul
Pinterest数据科学及机器学习主管
“做好机器学习需要人们学习训练数据。这本书价值连城。”
——Neal Linson
InCite Logix和LLM Superstar
首席数据和分析官

训练数据与算法本身一样关系到数据项目的成败,因为大多数AI系统的失败都与训练数据有关。但是,尽管训练数据是AI和机器学习成功的基础,但却很少有全面的资源能帮助你掌握这一过程。
在这本实践指南中,作者Anthony Sarkis(Diffgram AI数据训练软件的首席工程师)向技术专业人员、管理人员、主题专家展示了如何使用和扩展训练数据,同时阐明了监督机器的人性化一面。工程领导者、数据工程师、数据科学专业人士都将深入;了解使用训练数据取得成功所需的概念、工具和流程。

通过本书,你将学习如何:
● 有效地使用包括模式、原始数据、注释在内的训练数据
● 改造你的工作、团队或组织,使其更加以AI/ML数据为中心
● 向其他员工、团队成员、利益相关者清晰地解释训练数据概念
● 为生产级AI应用设计、部署、交付训练数据
● 识别并纠正新的基于训练数据的故障模式,如数据偏差
● 自信地使用自动化技术来更有效地创建训练数据
● 成功维护、操作、改进训练数据记录系统
  1. Preface
  2. 1. Training Data Introduction
  3. Training Data Intents
  4. Training Data Opportunities
  5. Why Training Data Matters
  6. Training Data in the Wild
  7. Generative AI
  8. Summary
  9. 2. Getting Up and Running
  10. Introduction
  11. Getting Up and Running
  12. Tools Overview
  13. Trade-Offs
  14. History
  15. Summary
  16. 3. Schema
  17. Schema Deep Dive Introduction
  18. Labels and Attributes—What Is It?
  19. Spatial Representation—Where Is It?
  20. Relationships, Sequences, Time Series: When Is It?
  21. Guides and Instructions
  22. Relation of Machine Learning Tasks to Training Data
  23. General Concepts
  24. Summary
  25. 4. Data Engineering
  26. Introduction
  27. Raw Data Storage
  28. Formatting and Mapping
  29. Data Access
  30. Security
  31. Pre-Labeling
  32. Summary
  33. 5. Workflow
  34. Introduction
  35. Glue Between Tech and People
  36. Getting Started with Human Tasks
  37. Quality Assurance
  38. Analytics
  39. Models
  40. Dataflow
  41. Direct Annotation
  42. Summary
  43. 6. Theories, Concepts, and Maintenance
  44. Introduction
  45. Theories
  46. General Concepts
  47. Sample Creation
  48. Maintenance
  49. Training Data Management
  50. Summary
  51. 7. AI Transformation and Use Cases
  52. Introduction
  53. AI Transformation
  54. Appoint a Leader: The Director of AI Data
  55. Use Case Discovery
  56. The New “Crowd Sourcing”: Your Own Experts
  57. Modern Training Data Tools
  58. Summary
  59. 8. Automation
  60. Introduction
  61. Getting Started
  62. Trade-Offs
  63. Pre-Labeling
  64. Interactive Annotation Automation
  65. Quality Assurance Automation
  66. Data Discovery: What to Label
  67. Augmentation
  68. Simulation and Synthetic Data
  69. Media Specific
  70. Domain Specific
  71. Summary
  72. 9. Case Studies and Stories
  73. Introduction
  74. Industry
  75. An Academic Approach to Training Data
  76. Summary
  77. Index
书名:机器学习数据训练(影印版)
作者:Anthony Sarkis
国内出版社:东南大学出版社
出版时间:2024年03月
页数:329
书号:978-1492094524
原版书书名:Training Data for Machine Learning
原版书出版商:O'Reilly Media
Anthony Sarkis
 
Anthony Sarkis是Diffgram人工智能数据训练软件的首席工程师,也是Diffgram公司的首席技术官和创始人。在此之前,他是Skidmore, Owings & Merrill公司的研发软件工程师,并与他人共同创办了DriveCarma.ca。
 
 
The animals on the cover of Training Data for Machine Learning are black-tailed prairie dogs (Cynomys ludovicianus). While they are actually a type of ground squirrel, they received the name prairie dog because of the habitats they live in and because the sound of their warning calls are similar to a dog’s bark.

Black-tailed prairie dogs are small rodents that weigh between 2 and 3 pounds and grow between 14 and 17 inches long. They have mostly tan fur that is lighter on their bellies and their namesake black tail tip. They have short, round ears, and eyes that are relatively large in comparison to the size of their bodies. Their feet have long claws, which are ideal for digging burrows into the ground.

True to their name, black-tailed prairie dogs live in a variety of grasslands and prairie in the Great Plains of North America. Their habitat usually consists of flat, dry, sparsely vegetated land, such as short grass prairie, mixed-grass prairie, sagebrush, and desert grasslands. Their expansive range is east of the Rocky Mountains in the United States and Canada to the border of Mexico.

Black-tailed prairie dogs may not be considered endangered, but they are a keystone species. They impact the diversity of vegetation, vertebrates, and invertebrates because of their foraging habits and presence as potential prey. It has been shown that grasslands inhabited by them have a higher degree of biodiversity than grasslands not inhabited by them. Prior to a large amount of habitat destruction, they used to be the most abundant species of prairie dog in North America. Many of the animals on O’Reilly covers are endangered; all of them are important to the world.
购买选项
定价:118.00元
书号:978-1492094524
出版社:东南大学出版社