Microsoft is detailing how it handles bugs in its software and services using machine learning models. “47,000 developers generate nearly 30,000 bugs a month,” explains Scott Christiansen, a senior security program manager at Microsoft. The software maker tracks these bugs across GitHub and AzureDevOps repositories, but it’s a lot of issues to track with just traditional labeling and prioritization.
Microsoft is now using nearly 20 years of historical data across 13 million work items and bugs to create a machine-learning model that can separate security and non-security bugs 99 percent of the time. It’s a model that’s designed to help developers accurately identify and prioritize critical security issues that need fixing.
“Our goal was to build a machine learning system that classifies bugs as security / non-security and critical/non-critical with a level of accuracy that is as close as possible to that of a security expert,” explains Christiansen. Microsoft fed its machine learning model bugs that were labeled security and non-security to train it and make sure the data wasn’t too noisy. The model then learned how to classify security bugs and apply severity labels like critical, important, or low-impact to each.
Security experts and data scientists worked together at Microsoft to create the model, ensuring that it could be monitored in production and that a random sampling of bugs are manually reviewed. The model is also continually re-trained with new data that’s reviewed by Microsoft’s security experts. This machine learning model means Microsoft now accurately identifies security bugs 99 percent of the time, and labels them correctly 97 percent of the time.
It’s unusual for a company the size of Microsoft to reveal how many bugs its developers generate on a monthly basis, let alone how it tackles these. Microsoft is now planning to open source its methodology to GitHub, allowing other companies with similar data sets to implement a similar model. If you’re interested in learning more about Microsoft’s machine learning techniques, the company has published a academic paper with all the details.
Originally posted: Source link