Pioneers of High Performance Storage System (HPSS) are devising ways to streamline and rationalize data management products for its upcoming eighth generation. 25 years back, US Department of Energy research laboratories and IBM together built HPSS to support massive government science research projects. Why? The Hierarchical storage solution is undeniably a rewarding concept which uses organization policies and software automatic tricks to decide which data to save, the location where it should be saved, the best time to move it to different storage devices and when to delete it.
“How do you know what you’re archiving? We’re talking about archives now that are hundreds of petabytes to an exabyte. We think we’re going to be there in 2-3 years,” asked Todd Herr, a storage architect for supercomputing from Lawrence Livermore National Laboratory, CA.
The HPSS website catalogues 37 publicly disclosed customers, while other customers are kept discreet. At present, version 7.5.1 from last year is on the run, but version 7.5.2 might be hit, while the next year will see 7.5.3, as given in the online roadmap.
However, version 8 is not yet available on the official roadmap, but here’s what the insiders have to say about it…
“What I think our challenge is, is to become good data curators. And I think that’s where we’re going to point the product,” Herr shared. This will turn HPSS become more capable for data mining and assign metadata to itself.
In order to do that, the first thing to be done is to reveal information in the archive about a few overarching namespace applications. Herr explained, “Right now we are working on that (referring to software made by companies such as Atempo, Robinhood, Starfish, and StrongLink). I think the next step there is scaling out metadata performance, such as database partitioning and virtualizing multiple processors when performing searches.”
Another important part of HPSS is related to the software that works with tape storage – “What we’re trying to do is enable fast access to tape. If you look across the industry spectrum, the words fast and tape generally don’t go together,” Herr intimidated. The scientists at Livermore are capable of accessing research data on tape, even that existed more than 50 years ago.
Speed-matching buffers can save the day – when placed between primary disk storage and archive tape storage, they can be used to both read and write. Some other physical improvements include faster head placements and tape motors.
“We’re going to hit a problem way faster than most sites, and certainly faster than the vendors themselves because they cannot replicate our environment in most testing,” Herr asserted.
Herr’s employer’s next supercomputer, Sierra is going to operate at up to 125 petaflops and will have a 125-petabyte file system for performing ample tests to find new ways of speeding up performance and administer advanced data storage mechanisms.
The article has been sourced from – https://www.techrepublic.com/article/fed-and-ibm-researchers-adding-new-intelligence-to-massive-storage-management-system
For more such interesting ideas and discussions, stay tuned to DexLab Analytics. It is a premier analytics training institute headquartered in Delhi, NCR. Their data science certification courses are excellent.
Interested in a career in Data Analyst?
To learn more about Machine Learning Using Python and Spark – click here.
To learn more about Data Analyst with Advanced excel course – click here.
To learn more about Data Analyst with SAS Course – click here.
To learn more about Data Analyst with R Course – click here.
To learn more about Big Data Course – click here.