Data Science and Data Scientist | HackThatCORE

Data Science and Data Scientist | HackTHatCORE

Data Science and Data Scientist | HackThatCORE

Data Science and Data Scientist

Image Source: KDnuggets

Data science is an interdisciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from data in various forms, both structured and unstructured,similar to data mining.
Data science is a "concept to unify statistics, data analysis, machine learning and their related methods" in order to "understand and analyze actual phenomena" with data.It employs techniques and theories drawn from many fields within the context of mathematics, statistics, information science, and computer science.
Turing award winner Jim Gray imagined data science as a "fourth paradigm" of science (empirical, theoretical, computational and now data-driven) and asserted that "everything about science is changing because of the impact of information technology" and the data deluge.

Modern Data Scientist

Data Scientist, the most beautiful job of 21st century, is a mixture of multidisciplinary skills ranging from an intersection of Mathematics, Statistics, Computer Science, Communication and Business. Finding a Data Scientist is hard. Finding people who understands who a data scientist is, is equally hard. So here is a little idea about who the modern data scientist really is.

  • Maths and Statistics

    • Machine Learning
    • Statistical Modeling
    • Experiment Design
    • Bayesian Interface
    • Supervised learning: decision trees, random forests, logistic regression
    • Unsupervised learning: clustering, dimensionality reduction
    • Optimization: gradient descent and variants
  • Domain Knowledge and Soft Skills

    • Passionate about the business
    • Curious about data
    • Influence without authority
    • Hacker Mindset
    • Problem Solver
    • Strategic, proactive, creative, innovative and collaborative
  • Programming and Database

    • Computer Science Fundamentals
    • Scripting Language for example Python
    • Statistical Computing Package for example R
    • Database SQL and NoSQL
    • Relational Algebra
    • Parallel Databases and parallel query processing
    • MapReduce concepts
    • Hadoop and Hive/Pig
    • Custom reducers
    • Experience with xaaS like AWS
  • Communication and Visualization

    • Able to engage with senior management
    • Story telling skills
    • Translate data-driven insights into decisions and actions
    • Visual art design
    • R packages like ggplot or lattice
    • Knowledge of any of visualization tools for example Flare, D3.js, Tableau

Comments