First and foremost it is important to have a solid grasp on the data you have. What this means is that you should plot, analyse and look for patterns, trends, or relationships in your data. There are various methods to go about this, the easiest way would be to plot two features together and look for trends based off of how they are correlated. Another way would be to use PCA (Principal Component Analysis). This step varies greatly based on the type of data you have, an example would be classified vs unclassified.
Once you have a grasp on your data, you need to know what the problem you’re looking to solve is, and based off of that you further decide what parameters/features are important for you to use. If you’re simply looking to predict an outcome given specific features, in regards to the data you now have, you want to determine a model best suited for the task. There are many ways of going about this, and once more this all depends on the specifics of the problem. Common models include: Linear regression, Logistic Regression, SVMs, K-means, KNNs, Decision Trees, Random Forrests, Gradient Boosting, Extreme Gradient Boosting, etc…
Next after determining what model you believe to be correct, you train the model based off of the cleaned feature data. There are a bunch of libraries that allow for this, a really popular one is SciKit-Learn.
From there, you save the model and then input the parameters who decided that it would take, and it will predict an output for you.
Things to keep in mind: A lack of data, or too many featuers will result in overfitting. Overfitting means that your model will work extremely well for your current dataset, but as soon as a new external data point is given it will fail. Generally, some overfitting is inevitable and is not bad, however too much and your model will have limited use.
I have introduced this because we shall be doing a bit of machine learning using shapefiles.
see you at the top
Once you have a grasp on your data, you need to know what the problem you’re looking to solve is, and based off of that you further decide what parameters/features are important for you to use. If you’re simply looking to predict an outcome given specific features, in regards to the data you now have, you want to determine a model best suited for the task. There are many ways of going about this, and once more this all depends on the specifics of the problem. Common models include: Linear regression, Logistic Regression, SVMs, K-means, KNNs, Decision Trees, Random Forrests, Gradient Boosting, Extreme Gradient Boosting, etc…
Next after determining what model you believe to be correct, you train the model based off of the cleaned feature data. There are a bunch of libraries that allow for this, a really popular one is SciKit-Learn.
From there, you save the model and then input the parameters who decided that it would take, and it will predict an output for you.
Things to keep in mind: A lack of data, or too many featuers will result in overfitting. Overfitting means that your model will work extremely well for your current dataset, but as soon as a new external data point is given it will fail. Generally, some overfitting is inevitable and is not bad, however too much and your model will have limited use.
I have introduced this because we shall be doing a bit of machine learning using shapefiles.
see you at the top