Improved storage optimisation and retrieval time for SCANOSS
- Customer
- Industry
- Service
- Segment
- Author
- SCANOSS
- Information Technology
- Cloud Storage
- SMB
- Jodie Rhodes
At a glance
SCANOSS transformed the SCA space with a 100% Open Source platform, enabling developers to create compliant code from the start.
Challenge
Solution
Firemind partnered with SCANOSS to migrate their infrastructure to AWS, enhancing IOPS, storage efficiency, and retrieval speed.
Services used
- Amazon S3
- Amazon EC2
- Amazon VPC
- Amazon CloudFormation
Outcomes
- 65% reduction in codebase scanning times
- Significant cost savings
Business challenges
Unique challenges for SCANOSS
The challenges that SCANOSS faced were quite unique, in comparison to other data retrieval businesses. As one of the first companies to provide a completely open source SCA tooling system, they have an extremely large dataset of 11 Terabytes! This means they’ve been able to index the majority of public open source code (from sources such as GitHub, Bitbucket, GitLab, RhodeCode).
In order for SCANOSS to scan the open source material and produce results in a timely manner, they needed incredibly high IOPS (Input/Output Operations per second). Such a high level of reads/writes puts incredible strain, as well as cost, on the processes involved.
Solution
Architecting a solution on AWS
The focus on this project was to test storage optimised compute options in Amazon Web Services (AWS), with the ultimate goal to fully migrate to AWS. In order to achieve this, we had to prove the viability of a new architecture, one that could outperform their current production and workflows.
Due to the level of incredibly high IOPS, we knew that current AWS data services (such as Amazon RDS, Amazon Aurora and Amazon DynamoDB) would not be able to match the current performance they were experiencing, whilst providing speed and cost benefits. They were using Microsoft Azure with a dedicated disk and flat files. To combat this, we instead took the route of using Amazon S3 combined with Amazon EC2 I3 Instances.
Amazon EC2 I3 instances are the next generation of Storage Optimised instances for high transaction, low latency workloads. I3 instances offer the best price per I/O performance for workloads such as NoSQL databases, in-memory databases, data warehousing, Elasticsearch and analytics workloads.
From the utilisation of EC2 I3, we could take our sample data (40GB from the 11TB) and begin to check time and cost savings against the data updates and high IOPS. We found that we could cut down the scan and produce results from 79 minutes to around 29-31 minutes. This led to a 65% reduction in overall time against further tests, with cost savings equally reflecting the speed.
Time to value
As with many of our data migration projects, we were able to quickly understand the customer’s goals and work backwards to ensure we crafted an architecture fit for purpose. Within 8 days, split across 7 weeks, we worked closely with SCANOSS and were able to deliver the high IOPS scan times and costs savings they wanted.
Integrated training
Both Firemind and SCANOSS worked very closely to ensure that we could quickly identify challenges in the project, as well as provide ongoing training on the benefits of EC2 I3 instances. This would ensure the SCANOSS Lead Developer would feel confident moving forwards as their Microsoft Azure environment moved to AWS.
High transaction workloads
11TB of open source data is no joke! And having a business that needs to scan, update and address edits and changes to such a vast amount takes some high transaction workloads. This project showcased the next generation usage of optimised storage solutions, with results that will have a direct effect on the future of SCANOSS.
Get in touch
Want to learn more?
Seen a specific case study or insight and want to learn more? Or thinking about your next project? Drop us a message!