MLB 1. lecture - Intro to Machine Learning

Data science - process of extracting knowledge and insights from data (using statistics, computer science and domain expertise)
- the broadest topic in MLB
Data analytics - process of examining, cleaning, transforming and modeling data in order to discover useful information and support decision-making
- most manual of all
Data mining - process of discovering patterns, anomalies and relationships in large datasets
Machine learning - process of building algorithms that can learn from data and predict the future values without being explicitly programmed to do so
Artificial intelligence - the art of creating systems to perform tasks where human intelligence is required (reasoning, learning and perception)
General AI - theoretical form of AI that can perform any task a human can with general reasoning, learning abilities across domains
Big Data - see Big Data
Data engineering - discipline of designing and building systems that can collect, store and process big data efficiently and reliably
- really important for any data analysis or machine learning
Machine learning engineering - the practice of implementing, deploying and maintaining ML models in production environments
Data governance - set of policies, processes and standards that ensure data is accurate, secure and used responsibly across the whole organization

half way in the middle :)
it gives some structure to unstructured data (e.g. text labelling)
- but the structure is not that rigid as structured data
XML, HTML, JSON, YAML

features
- = input variables, could be boolean features, categorical features etc.
- used as input variables to predict the outcome I am interested in
instance/sample
- individual observation in the dataset, described by a set of features
labels/targets
- the predicted/outcome variable
- label in classification, target in regression

predicting the output variable (target/label) based on previous (already labelled data)
labelling past data sometimes requires human labour

predicting the category or class label based on previously labelled data
we have binary classification (1 - is sick, 0 - is not sick) or multi-label classification

predicting the continuous numerical value (based on previous data with continuous numerical value outcomes)

Petrova digitální zahrada 🚀