《数据科学（影印版）》—

数据科学（影印版）

出版时间：2014年06月

页数：408

现在人们已经意识到数据可以让选举或者商业模式变得不同，数据科学作为一项职业正在不断发展。但是你应该如何在这样一个广阔而又错综复杂的交叉学科领域中开展工作呢？这本书将会告诉你所需要了解的一切。它富有深刻见解，是根据哥伦比亚大学的数据科学课程的讲义整理而成。
在大多数一章长度的讲稿中，来自如Google、Microsoft和eBay这样的公司的数据科学家通过展示案例研究和他们使用的代码分享了新的算法、方法和模型。如果你熟悉线性代数、概率论和统计学并且具备编程经验，那么这本书就是绝佳的数据科学介绍读本。

主题包括：
· 统计推断、探索性数据分析和数据科学处理
· 算法
· 垃圾邮件过滤、朴素贝叶斯和数据转化
· 逻辑回归
· 金融建模
· 推荐引擎和因果关系
· 数据可视化
· 社交网络和数据新闻
· 数据工程、MapReducing、Pregel和Hadoop

Rachel Schutt，新闻集团数据科学高级副总裁，是哥伦比亚大学的统计学兼职教授，也是数据科学和工程学院教育委员会的创始会员。
Cathy O’Neil，Johnson研究实验室的高级数据科学家，具有哈佛大学的数学博士学位，是麻省理工学院数学系的博士后，曾经是巴纳德学院的教授。

目录
产品信息
关于作者

preface
1. introduction： what is data science
big data and data science hype
getting past the hype
why now
datafication
the current landscape （with a little history）
data science lobs
a data science profile
thought experiment： meta-definition
ok， so what is a data scientist， really
in academia
in industry
2. statistical inference， exploratory data analysis， and the data science
process
statistic.a1 thinking in the age of big data
statistical inference
populations and samples
populations and samples of big data
big data can mean big assumptions
modeling
exploratory data analysis
philosophy of exploratory data analysis
exercise： eda
the data science process
a data scientist's role in this process
thought experiment： how would you simulate chaos
case study： realdirect
how does realdirect make money
exercise： realdirect data strategy
3. algorithms
machine learning algorithms
three basic algorithms
linear regression
k-nearest neighbors （k-nn）
k-means
exercise： basic machine learning algorithms
solutions
summing it all up
thought experiment： automated statistician
4. spare filters， naive bayes， and wrangling
thought experiment： learning by example
why won't linear regression work for filtering spare
how about k-nearest neighbors
naive bayes
bayes law
a spare filter for individual words
a spam filter that combines words： naive bayes
fancy it up： laplace smoothing
comparing naive bayes to k-nn
sample code in bash
scraping the web： apis and other tools
jake's exercise： naive bayes for article classification
sample r code for dealing with the nyt api
5. logistic regression
thought experiments
classifiers
runtime
you
interpretability
scalability
m6d logistic regression case study
chck models
the underlying math
6．1ime stamps and financial modeling
7．extracting meaning from data
8．recommendation engines：building a user-facing data product at scale
9．data visualization and fraud detection
10．sociai networks and data journalism
11．causality
12．epidemiology
13．lessons learned from data competitions：data leakage and model evaluation
14．data engineering：mapreduce，pregel，and hadoop
15．the students speak
16．next-generation data scientists，hubris，and ethics
index

书名：数据科学（影印版）

作者：Rachel Schutt, Cathy O'Neil 著

国内出版社：东南大学出版社

出版时间：2014年06月

页数：408

书号：978-7-5641-4984-0

原版书书名：Doing Data Science

原版书出版商：O'Reilly Media

Rachel Schutt

美国新闻集团旗下数据科学部门高级副总裁、哥伦比亚大学统计系兼职教授、约翰逊实验室高级研究科学家，同时也是哥伦比亚大学数据科学及工程研究所教育委员会的发起人之一。她曾在谷歌研究院工作数年，负责设计算法原型并通过建模理解用户行为。

查看Rachel Schutt更多信息

Cathy O'Neil

购买选项

定价：74.00元

书号：978-7-5641-4984-0

出版社：东南大学出版社

联系出版社邮购