Projects are an integral part of INSOFE programs. We strongly believe that for our students to be able to effectively apply the concepts learned in the classroom (We ARE Applied Engineering, after all, aren’t we?), a project that solves a pressing business problem with real data sets is the BEST (nay, ONLY) way to go. These projects immediately give them the opportunity to deal with real-world business problems and prepare them to handle the most complex projects in the quickest time possible.
Students spend 150-1000 hours on the project depending on whether they are pursuing the CPEE or the Masters program. While we publish our results in journals, we also feel that the industry would benefit from these efforts sooner if we gave some insights into our work through this informal medium. This is because most of our projects have a very strong business case behind them unlike the typical engineering theses. That is the reason we started this blog.
The high level goal of our work is to design powerful forecasting and classification techniques for high dimensional real world data. We use randomization and smart democratization as the main tools to tackle this problem.
We are extending the methodology of “random forests” to other forecasting and classification techniques like Logistic Regression, K-Nearest Neighbors, Neural Networks, etc. We apply ideas like Genetic Algorithms and Simulated Annealing to select the weak classifiers better. We then experiment with variables of the weak classifiers systematically to get better forecasts. Even while voting, we vary the weight of a weak classifier systematically based on its accuracy. Most importantly, we analyze which type of settings works best for which type of data and investigate why. Of course, we always benchmark our methods with the more traditional alternatives.
All the students, as part of their projects:
- Take up data from a industry vertical and capture the business advantage of accurate forecasting
- Analyze and prepare the data statistically
- Take one or more advanced forecasting techniques and experiment with variables carefully and systematically
- Document the improvements and visualize
We have taken large data sets from real world (Kaggle, CrowdANALYTIX and a few other open sources) for this experimentation.
In this section, we give you a feel for the projects they are working on, with regular updates on their progress and the results.