What Are Artificial Neural Networks?
- Artificial Neural Networks (ANNs) are computational models inspired by the human brain.
- They consist of interconnected nodes or "neurons" that work together to process and learn from input data.
ANNs are used for tasks like classification, regression, and pattern recognition.
Key Components of ANNs
Nodes (Neurons)
- Role: Fundamental units of the network.
- Each neuron:
- Receives inputs.
- Applies weights + bias.
- Passes the result through an activation function.
- Produces an output that feeds into the next layer.
Weights
- Numerical values that determine the strength/importance of each input connection.
- Adjusted during training to minimize errors.
Bias
- A constant term added to the weighted sum.
- Helps the network shift activation functions to better fit data.
- Without bias, some patterns may be impossible to learn.
Activation Function
- Decides whether a neuron “fires” and introduces non-linearity.
- Common examples:
- Sigmoid: Output between 0 and 1.
- ReLU (Rectified Linear Unit): Outputs positive values or 0.
- Tanh: Output between –1 and 1.
- Without activation functions, ANNs would act like simple linear models.
Layers of an ANN
- Input Layer: Receives raw data (features).
- Hidden Layers: Perform computations and extract patterns.
- Multiple hidden layers = deep learning.
- Output Layer: Produces final prediction (e.g., class label, probability, number).
- ANNs process data step by step: Inputs → Weights + Bias → Activation → Layers → Output.
- Training adjusts weights and biases so the network learns meaningful patterns.
How ANNs Work
- Forward Propagation:
- Data moves from the input layer through hidden layers to the output layer.
- Activation Functions:
- Introduce non-linearity, allowing the network to learn complex patterns.
- Backpropagation:
- The network adjusts weights and biases based on the error between predicted and actual outputs.
- When designing an ANN, choose activation functions carefully.
- ReLU is popular for hidden layers due to its simplicity and efficiency, while sigmoid is often used in output layers for binary classification.
The Perceptron: Building Block of ANNs
What Is a Perceptron?
- A perceptron is the simplest form of a neural network, consisting of a single node.
- It can classify linearly separable data by applying a weighted sum of inputs and an activation function.
- Think of a perceptron as a decision-maker.
- It takes inputs, weighs their importance, adds a bias, and then decides whether to activate based on an activation function.
Structure of a Perceptron
- Inputs:
- Data features (e.g., $I_1$, $I_2$).
- Weights:
- $W_1$, $W_2$, etc.
- Bias:
- A constant value added to the weighted sum.
- Activation Function:
- Transforms the weighted sum into an output.
- Consider a perceptron that classifies emails as spam or not spam.
- The inputs might be features like the presence of certain keywords, and the weights determine the importance of each feature.
Activation Functions
- Step Function:
- Outputs 1 if the weighted sum exceeds a threshold, otherwise 0.
- Sigmoid Function:
- Maps the output between 0 and 1, useful for probabilities.
- ReLU (Rectified Linear Unit):
- Outputs the input if positive, otherwise 0.
- A single perceptron can only classify linearly separable data.
- For more complex patterns, we need multi-layer networks.
Multi-Layer Perceptrons (MLPs)
What Is a Multi-Layer Perceptron?
A Multi-Layer Perceptron (MLP) is a type of ANN with one or more hidden layers. It can model complex, non-linear patterns in data.
Structure of an MLP
- Input Layer:
- Receives raw data.
- Hidden Layers:
- Perform computations to extract features.
- Output Layer:
- Produces the final prediction.
- The hidden layers are where the magic happens.
- They allow the network to learn complex relationships that a single perceptron cannot capture.
How MLPs Work
- Forward Propagation:
- Data flows through the network, with each layer applying weights, biases, and activation functions.
- Backpropagation:
- The network adjusts weights based on the error, using techniques like gradient descent.
- When designing an MLP, consider the number of hidden layers and nodes.
- More layers can capture complex patterns but may increase the risk of overfitting.
Modeling Complex Patterns with MLPs
Why Use MLPs?
- MLPs are powerful because they can model non-linear relationships in data.
- This makes them ideal for tasks like image recognition, natural language processing, and more.
Image Classification
- Input Layer:
- Receives pixel values of an image.
- Hidden Layers:
- Extract features like edges, shapes, and textures.
- Output Layer:
- Classifies the image (e.g., cat or dog).
The ability to learn hierarchical features is what makes MLPs so effective in complex tasks.