Scikit-Learn, also known as sklearn, is a powerful Python library for machine learning. It is built on top of other popular scientific computing libraries like NumPy, SciPy, and matplotlib. Scikit-Learn provides a wide range of tools for machine learning and data analysis, including classification, regression, clustering, and dimensionality reduction. In this article, we will discuss some of the important features of Scikit-Learn and show some examples of how to use it.
here are some examples of using scikit-learn in machine learning tasks, along with the code snippets:
Classification
Logistic Regression
from sklearn.linear_model import LogisticRegression
# create logistic regression model
model = LogisticRegression()
# fit the model on the training data
model.fit(X_train, y_train)
# predict class labels for the test set
y_pred = model.predict(X_test)
from sklearn.tree import DecisionTreeClassifier
# create decision tree model
model = DecisionTreeClassifier()
# fit the model on the training data
model.fit(X_train, y_train)
# predict class labels for the test set
y_pred = model.predict(X_test)
from sklearn.ensemble import RandomForestClassifier
# create random forest model
model = RandomForestClassifier()
# fit the model on the training data
model.fit(X_train, y_train)
# predict class labels for the test set
y_pred = model.predict(X_test)
from sklearn.linear_model import LinearRegression
# create linear regression model
model = LinearRegression()
# fit the model on the training data
model.fit(X_train, y_train)
# predict target values for the test set
y_pred = model.predict(X_test)
from sklearn.linear_model import Ridge
# create ridge regression model
model = Ridge(alpha=0.1)
# fit the model on the training data
model.fit(X_train, y_train)
# predict target values for the test set
y_pred = model.predict(X_test)
from sklearn.svm import SVR
# create support vector regression model
model = SVR(kernel='linear')
# fit the model on the training data
model.fit(X_train, y_train)
# predict target values for the test set
y_pred = model.predict(X_test)
The Iris dataset is a classic dataset used in machine learning. It contains 150 samples, each with four features (sepal length, sepal width, petal length, and petal width) and a target variable (the species of the iris). We can load this dataset using the load_iris() function from Scikit-Learn, as shown below:
# Import necessary libraries
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score
# Load iris dataset
iris = load_iris()
# Split dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.2, random_state=42)
# Initialize KNN classifier
knn = KNeighborsClassifier(n_neighbors=3)
# Fit classifier to training data
knn.fit(X_train, y_train)
# Make predictions on testing data
y_pred = knn.predict(X_test)
# Calculate accuracy of predictions
accuracy = accuracy_score(y_test, y_pred)
# Print accuracy
print(f"Accuracy: {accuracy}")
In this code, we first load the iris dataset using scikit-learn's load_iris() function. We then split the data into training and testing sets using the train_test_split() function.
Next, we initialize a KNN classifier with n_neighbors=3 and fit it to the training data using the fit() method. We then make predictions on the testing data using the predict() method and calculate the accuracy of our predictions using the accuracy_score() function. Finally, we print the accuracy of our classifier.
Note that you can experiment with different values for n_neighbors to see how it affects the accuracy of the classifier. Additionally, you can use other classification algorithms from scikit-learn, such as decision trees or logistic regression, to build classifiers for the iris dataset.
Advertisement
Advertisement