Main Page

From RapidDisk
Jump to: navigation, search

Welcome to the Official RapidDisk Project Wiki


An open source enhanced Linux RAM disk module. This Wiki is intended to serve as a source for RapidDisk related materials which includes installation, setup, configuration, and troubleshooting guides.


USE THE FASTEST DATA DRIVE ON THE PLANET! The RapidDisk collection of software offers nothing but lightning speed performance. No additional hardware necessary! Just use existing DRAM. NO NEED TO PURCHASE NEW HARDWARE Breathe new life into older or existing hardware. Give it the extra boost in performance without the need to spend a single dime.


About

From the beginning, RapidDisk or rxdsk was designed to be used in high performing environments and has been designed with simplicity in mind. Utilizing a user land binary, the system administrator is capable of dynamically adding new RAM based block devices of varying sizes, removing existing ones to even listing all existing RAM block devices and more. The rxdskmodule has been designed to allocate from the system’s memory pages (as it needs it and not all at once during creation) and is capable of addressing memory to support Gigabytes if not a Terabyte of available Random Access Memory. On top of that, with the later integration of RapidCache, one can leverage the high performance of RapidDisk volumes and map them as a caching node to slower drives. The best part of this solution is that it is free. It will only cost you a bit of time to learn, test, and configure the solution. And if you need help, I offer support and consultation services and nobody knows my code better than me.


Features

  • Dynamically attach, detach and resize multiple rxdsk RAM disk nodes.
    • Read/Write from the block device at super high performing speeds (20-30x faster (and even more) than your local magnetic SATA drive).
    • Create and manage volume sizes from as small as a few MBytes to (in theory) as large as TBytes (in both the CLI and through the Python/pyGTK graphical wrapper).
  • Archive and compress or decompress and restore rxdsk node data.
  • rxdsk nodes allocate memory pages as they are needed.
  • Map an rxdsk volume to a current physical drive as a caching node to increase I/O performance. (2.0 only – see RapidCache)
  • The best part of both RapidDisk and RapidCache modules is that by writing or caching data into memory, you are increasing the life of the physical storage device. That is, you are either not moving mechanical components seeking from sector to sector or not reducing your total cell PE count on an SSD.


Key environments for use

  • Quick I/O storage (configured as a DRAM-based SSD with enabled syncing to persistent storage)
    • Mounted as a block device labeled with a traditional file system and functioning as a traditional storage device
  • File system meta-data, application and/or database caching
  • Local and remote block device caching (2.0 only – see RapidCache)


Compatibility

The RapidDisk collection is officially supported on the following operating system distributions:

Other distributions will work, but are not officially tested.


Media or public coverage


Additional information

This project is licensed under the General Public License (GPL) Version 2.0 (v2) for the Linux kernel module and the GPLv3 for the userland binary. Therefore it is open to contribution from its community of users. If you have any questions or concerns surrounding this project, please do not hesitate to contact me at devel AT petroskoutoupis.com.


Other interesting Linux RAM drive & caching projects


Software


RapidDisk

RapidDisk Diagram
RapidDisk Diagram

RapidDisk (or rxdsk) is designed to be used in environments requiring high performing data transfers and has been designed with simplicity in mind. Utilizing a management binary application, the system administrator is capable of dynamically adding new RAM based block devices of varying sizes, resizing those block volumes, removing existing ones, to even listing all existing RAM block devices on the system. The best part of the rxdsk module is that it has been designed to allocate storage space from the system’s memory pages as it becomes needed and is capable of scaling upward to address Random Access Memory sizes in the Gigabytes if not a Terabytes.

To date, RapidDisk has been adopted by many companies and institutions and used either in their internal data centers, testing infrastructures, distributed software / hardware products, to even as a learning tool.


Performance

Performance Summary Bar Graph
Performance Summary Bar Graph

The performance numbers obtained here with RapidDisk will always be consistent as opposed to the Flash memory where over time, and once the Flash storage enters its Programmable Erase (PE) cycles, the speeds drastically drop and provide inconsistent results. You can observe some basic numbers below. Note – The sequential and random access I/O to both DRAM and Flash based technologies is transparent as there are no movable components to distinguish the two to provide differing results. The results that you see below for the sequential access I/O will not differ from the numbers obtained in random access I/O. RECENTLY it has been brought to my attention that certain companies using RapidDisk have obtained far greater numbers than the dated ones provided below. That is, performance that reaches as fast as 40 Gigabytes per second! Yes, you read that correctly, 40 GB/s.


These numbers were obtained using an older 64-bit system running a 3.1 GHz Xeon Quad Core processor. I used fio 2.0.3 and ran the test on volumes formatted and mounted with an Ext4 file system. Each job file ran 4 processes (one at a time): (1) Sequential Reads, (2) Random Reads, (3) Sequential Writes, and (4) Random Writes. Also note that I formatted the output to show each process group and results grouped together.


Profile RxDSK SATA (mechanical HDD)
1M xfer 1G file 32 depth Seq Reads: 5.3 GB/s Seq Reads: 154 MB/s
1M xfer 1G file 32 depth Seq Writes: 3.2 GB/s Seq Writes: 131 MB/s
1M xfer 1G file 32 depth Ran Reads: 7.3 GB/s Ran Reads: 93 MB/s
1M xfer 1G file 32 depth Ran Writes: 3.0 GB/s Ran Writes: 106 MB/s
4K xfer 1G file 32 depth Seq Reads: 1.7 GB/s Seq Reads: 116 MB/s
4K xfer 1G file 32 depth Seq Writes: 1.5 GB/s Seq Writes: 70 MB/s
4K xfer 1G file 32 depth Ran Reads: 1.6 GB/s Ran Reads: 2.2 MB/s
4K xfer 1G file 32 depth Ran Writes: 1.4 GB/s Ran Writes: 1.9 MB/s


RapidCache

RapidCache Diagram
RapidCache Diagram

RapidCache (or rxcache) is designed to leverage the high speed performing technologies of the RapidDisk RAM drive and utilizing the (Linux native) Device Mapper framework, map an rxdsk volume to act as a block device’s Write/Read-through cache. This can significantly boost the performance of a local or remote (and slower) disk device. The 2.0 release is supported for Linux kernels 2.6.32 and later.


What is Write-Through and Read-Through Cache?

This is where an application treats cache as the main data store and reads data from it and writes data to it. The cache is responsible for reading and writing this data to the permanent storage volume, thereby relieving the application of this responsibility.


In RapidCache, all writes are cached to a rxdsk volume but also written to disk immediately. All disk reads are cached. Cache is not persistent over device removal, reboots, or after you remove the Device Mapper mapping. This module does not store any cache metadata on RapidDisk volumes but instead in memory outside of rxdsk and the replacement policy is FIFO.


The best part of RapidCache is that you can map and unmap a cache drive to any volume at any appropriate time and it will not affect the integrity of the data on the physical storage drive. Again, an advantage seen only with the safest caching methods of Write/Read Through caching. So, one minute you can umount an existing volume, map it in a rxcache volume and remount it, while a few minutes later, you can return it back to its original and unmapped state. It is that simple and safe.


Environment(s) for use

In order to use this and have it be effective, the administrator must evaluate the appropriate configuration with respect to their I/O profile which includes file and transfer sizes. It should also be noted that this module is most ideal in environments with frequent read operations, especially over the same data regions.


  • Enable general block device caching for: (1) Locally attached disk devices and (2) Remotely attached disk devices mapped over a Storage Area Network (SAN).
  • Ideal for database requests (specifically data frequently accessed).
  • Also ideal for frequent small file read/write operations.
  • Alleviate read operations to Flash technologies and thus avoid the Read disturb corruption / data loss.

Do not use this in a virtual guest or with a loopback device. You will not see any performance improvements for reasons I do not feel like explaining at the moment. In fact, the performance will be worse in such environments. Only use this with an actual physical disk device.


Performance

Real world RapidCache numbers vary as a result of the data being in the cache or not. The advantage to using RapidCache is that it will significantly decrease random access latencies for cached data. When the accessed data is cached, the performance numbers will always be consistent as opposed to the Flash memory where over time, and once the Flash storage enters its Programmable Erase (PE) cycles, the speeds drastically drop and provide inconsistent results. You can observe some basic numbers below. Note – The sequential and random access I/O to both DRAM and Flash based technologies is transparent as there are no movable components to distinguish the two to provide differing results. The results that you see below for the sequential access I/O will not differ from the numbers obtained in random access I/O. RECENTLY it has been brought to my attention that certain companies using RapidCache have obtained far greater numbers than the dated ones provided below.


These numbers were obtained using an older 64-bit system running a 3.1 GHz Xeon Quad Core processor. I used fio 2.0.3 and ran the test on volumes formatted and mounted with an Ext4 file system. Each job file ran 4 processes (one at a time): (1) Sequential Reads, (2) Random Reads, (3) Sequential Writes, and (4) Random Writes. Also note that I formatted the output to show each process group and results grouped together.


Profile RxCACHE SATA (mechanical HDD)
4K xfer 32M file 32 depth Seq Reads: 802 MB/s Seq Reads: 104 MB/s
4K xfer 32M file 32 depth Seq Writes: 68 MB/s Seq Writes: 70 MB/s
4K xfer 32M file 32 depth Ran Reads: 907 MB/s Ran Reads: 7.9 MB/s
4K xfer 32M file 32 depth Ran Writes: 2.8 MB/s Ran Writes: 2.8 MB/s



Note – In my personal tests using RapidCache with 1 MByte transfers on the same hardware, reading cached data both sequentially and randomly, I have also obtained numbers as high as 1.0 – 1.2 GB/s and a bit more.


Downloads and Installation

You can clone the latest (and stable) 2.x source tree from the git repository: $ git clone http://git.rapiddisk.org/rxdsk-2.x.git

You can clone the latest (and stable) 1.x source tree from the git repository: $ git clone http://git.rapiddisk.org/rxdsk-1.x.git

If you already have an older copy of the source code previously cloned from the git repository, you can simply update it with the git pull command. You can also download the latest tarball release from the RapidDisk Sourceforge page.

Change into the directory and run the Makefile to build the management utility and kernel module:

 $ cd rxdsk-1.x
 $ make

NOTE – To build the rxadm management utility, you will need to have the zlib libraries and zlib development files installed on your host system. As of 1.3, the build links to zlib for compressed archiving and decompressed restoration of rxdsk volumes.
NOTE – While it will still install without it, in order to utilize the pyRxAdm graphical wrapper (introduced in 2.2), you are required to have both the Python programming language and the pyGTK module installed.
NOTE – To build the kernel module, you are required to having either the full kernel source or the kernel headers installed for your current kernel revision.


And install the management utility, kernel module, and man page (management utility is installed in /sbin): $ sudo make install

To insert the rxdsk module: $ sudo modprobe rxdsk

To remove the rxdsk module: $ sudo modprobe -r rxdsk

To insert the rxcache module: $ sudo modprobe rxcache

To remove the rxcache module: $ sudo modprobe -r rxcache


Installing modules with DKMS

This is only supported as of release 2.9.1 and later. To build: $ sudo make dkms


Installing / uninstalling the administration utilities

This is only supported as of release 2.9.1 and later.

Install: $ sudo make tools-install

Uninstall: $ sudo make tools-uninstall


Testing

The majority of data integrity testing and benchmarking was accomplished using the fio test utility. Also byte level drive contents were closely examined via the native hexdump utility and invoked: $ hexdump -Cv /dev/rxd0|less.

Located in the test/ subdirectory of the source code is a small collection of very quick tests to run on a RapidDisk attached RAM disk raw block device. The rxio program tests a few basic block device functions which include getting the volume size, reading, writing, to even seeking, the rxioctl program tests the kernel module’s RXD_GET_STATS custom ioctl and the rxflush program sends the BLKFLSBUF ioctl command to flush the block device’s buffer cache.


Building & executing the test tools

To compile:

 $ gcc rxio.c -o rxio
 $ gcc rxioctl.c -o rxioctl
 $ gcc rxflush.c -o rxflush

To execute in place (make sure you create an rxd0 RapidDisk volume prior to executing):

 $ ./rxio
 $ ./rxioctl
 $ ./rxflush

Note – The applications will only test the node named /dev/rxd0. You can change this in the code if necessary and recompile.


Tutorial and Miscellaneous


Changelog

2.13

  • Bug fix with rxadm binary and mapping RapidCache to pre-existing partitions.
  • Added more information to RapidDisk proc_fs file.
  • Addressed compile warnings for GCC 5.1

2.12

  • Updated modules for kernel 3.14

2.11

  • Updated rxcache for Red Hat 6.4 (device mapper conflict)
  • Addressed incorrect description of maximum number of rxdsks supported.

2.10

  • Updated modules for kernel 3.10.
  • Updated Makefile for cross compiling install.

2.9.2

  • Addressed a bug in rxdsk print statement (wrong type). Thanks go to Neo for discovering and patching it.

2.9.1

  • Minor update adding DKMS support.
  • Adding support to build and install/uninstall tools separately (i.e. without modules, as in when installing with DKMS).

2.9

  • Added better implementation of BLKFLSBUF ioctl to rxdsk module. This will “flush data” and truncate pages.
  • Added flush command to rxadm utility.
  • Added support for Linux kernel 3.9. Tested on 3.9.2.

2.8

  • Cleaned up code and removed unused and unimplemented caching feature (write-around).
  • Added support for Linux kernel 3.8. Tested on 3.8-rc7.

2.7

  • Made some modifications to the modules’ makefile.
  • Added support for Linux kernels 3.6 & 3.7. Tested on 3.6.9 and 3.7-rc8.

2.6

  • Minor rxcache kernel module update: Make spinlocks less greedy by removing most of the “disable ALL interrupts” spinlocks and replacing them with spinlocks to disable interrupts ONLY from bottom halves.

2.5

  • Added support for building in 3.4 and later Linux kernels.
  • Updated module Makefile to point to a different DESTDIR and KSRC (for cross-compiling)

2.4

  • Removed warning for RapidCache build (placed check for 32/64 bit arch).
  • Addressed an issue with md raid 1 (mirror) and using rxdsk in which the md driver would routinely send I/O of size 0 bytes and rxdsk would return an EIO, failing the array. Problem and solution found and provided by Dmitry Trikoz of Stratus Technologies.

2.3

  • Addressed warning generated for kernels 3.2 and later with the return type of the blk_queue_make_request request_queue function.
  • Added comments and cleaned error messages in pyRxAdm.
  • Added comments to rxadm files.

2.2.1

  • Added additional functionality to pyRxAdm (add, map, archive, restore) also added some more error checking.
  • Fixed bug in rxadm during the archival process.
  • Updated version no. and removed b’s to move from beta to production.
  • Added a couple more switches to list version/help info of rxadm
  • Cleaned up rxadm logo for pyRxAdm

2.2B

  • Added pyRxAdm wrapper to list and to unmap/detach rxdsk/rxcache volumes.
  • Modifed short-list feature output and modified error statement
  • Modified cmd/Makefile
  • Added logo for wrapper
  • Updated version no. for kernel modules

2.1B

  • Added –short-list support in rxadm management utility.
  • Cleaned up debug messages on modules and added a couple of more.

2.0.1B

  • Fixed bug #5 relating to using the rxadm utility without any nodes listed in /dev/mapper.
  • Cleaned up a few messages in both rxdsk.c and rxcache.c.

2.0B

  • Added rxcache write/read through caching module support.
  • Added rxcache management features in rxadm utility.
  • Modified input for archive/restore in rxadm to not use absolute path for rxdsk node. It maintains a form of consistency across all commands.

1.4

  • Fixed bug #4 by adding the BLKFLSBUF ioctl() command to process. This is specifically for when the user places an rxd node in an mdadm raid array.
  • Added a new test file to test the new ioctl command.
  • Cleaned up the Makefiles (bug #2) a bit and now the user can build and install the kernel module from the root of the package tree as opposed to doing it from the module directory.

1.3.2

  • Fixed bug #3 which was for a warning during the build of rxadm on an x86_64 architecture. Thanks go to Gustaf Ullberg for discovering the root cause and providing a quick solution.

1.3.1R2

  • Removed the “b” from version strings to signify non-beta. This project seems to be production ready.
  • Also added a test suite and some additional test tools for testing data integrity, performance, etc. This will help a lot for feature testing in future releases.

1.3.1B

  • Added check in management utility to make sure that rxdsk node is present before archiving and restoring compressed/ decompressed images.

1.3B

  • Added new feature to archive/restore an rxd volume to/from a zlib compressed data file (many thanks to Simon Ball for suggesting a similar feature).
  • Added discard support.
  • Added an ioctl to handle an invalid query sent by later versions of udev that correspond with Linux kernels 2.6.35 and above.
  • Also integrated patch submitted by James Plummer of Stratus Technologies to address 32 bit limitation of rxadm utility rxdsk creation/resizing by casting the variable to a 64-bit type. Patch also included minor clean up code/optimizations for the same rxadm util.

1.2B

  • Optimized the configuring of the request queue.
  • Added checks for module to build from kernels 2.6.32 all the way to the latest (currently 3.0.3).

1.1B

  • Added support for dynamic resizing of attached rxdsk volumes.

1.0B

  • Official stable public release.


Frequently Asked Questions (FAQ)


Why create a new Linux RAM drive module?

The traditional Linux RAM disk can only be statically loaded at boot time with a fixed size for all volumes; no more than 16 MB each. This Megabyte value can be adjusted but the result is still the same. You can reference the brd code of the Linux kernel for more details on this. Low capacities and not very manageable. I was primarily inspired by the Solaris implementation of their ramdisk module and the user land tool to accompany it, ramdiskadm. I was also partly inspired by the FreeBSD implementation called md or Memory Disk. My Linux module allows for dynamic creation and removal of RAM-based block devices for high performance computing at sizes varying from 16 MB to as high as 1 TB and larger. Plus it incorporates more up-to-date features and functionality.

Also, all other Linux implementations of RAM disks exist as file systems in memory. These include tmpfs and ramfs. These file systems are not always the most ideal solution to achieving high performance.


What inspired this project?

Achieving high performance played a big role but it was also a combination of a project that Petros Koutoupis started as a result of writing an article for a Linux publication about the Linux RAM Disk and the Solaris implementation of their ramdisk module.


What additional benefits will RapidDisk provide?

Aside from increasing productivity with little to no bottlenecks in accessing data, there is also the opportunity to save costs in cooling. For instance, if a lot of frequently accessed data is moved to a stable and redundant RAM disk, there is less need to access mechanical Hard Disk Drives which contain a lot of movable parts and in turn use a lot more power while generating significant amounts of heat. Data centers around the world invest a lot of money in cooling, to keep their equipment operational in moderately cool and stable temperatures.


How much memory should I allocate to my RapidDisk volumes?

As much as you see fit to do the job. So long as you do not exceeds the system’s limitations. We are not responsible for misuse or miscalculations and are not responsible for managing that for the user. As the user, you will need to determine how much the operating system will need to manage everything else and how much it can afford to use for our RAM disks.


Will you ever provide pre-built binary packages?

If and when this project matures, I will begin to provide binary (rpm & dpkg) packages to some of the major distributions. For now, the only way to get the code is to clone it from the git repository and build it from source.


Why must I have ZLIB preinstalled to build 1.3B and later?

As mentioned above, to build the rxadm management utility, you will need to have the zlib libraries and zlib development files installed on your host system. As of 1.3, the build links to zlib for compressed archiving and decompressed restoration of rxdsk volumes. The idea is to take an existing volume and compress up to the last allocated sector which the module keeps track of (sent to rxadm via an ioctl() call) to a local file. This concept will save storage space for both the archives and the restored volumes. The latter of which is determined by the the last allocated sector prior to archiving. Note – It is advised to umount the rxdsk volume before archiving.


Why am I seeing failures in an MD RAID 1 mirror?

I do not know the reason behind this but if you are using an rxdsk block device (alongside a physical block partition or a file configured as a loopback device) in an md RAID 1 mirror, fsync’s sent to the md volume will fail the rxdsk block device. This has been observed with mdadm version 3.1.3-0 (Fedora 14) and 3.1.4 (Ubuntu 11.10) but does not occur on the older releases of Red Hat/ CentOS 5.x with mdadm version 2.6.9-x. This problem is not unique to rxdsk but will also occur with the kernel’s already integrated brd RAM disk module (i.e. /dev/ram0, etc.) and seems to be more related to the md module. Again, this only occurs with a RAID 1 mirror. I also tested RAID levels 0 and 5, both of which do not exhibit the same problems with fsync. Sending an fsync to the rxdsk volume while it is not in an md RAID 1 array will not exhibit any problems. I submitted a bug report for this to the kernel team here.

Another thing to note is that on Fedora 14, when configured in the md RAID 1 array, any and all ioctl commands sent to the rxdsk block device will result in the following messages to appear in the kernel log:

rxdsk: -2146694896 is an unsupported ioctl command

I cannot explain this either. For whatever reason, even when you disable the md RAID, the ioctl errors still continue to the RAM disk, at least until you detach and re-attach it. These are the only known and documented issues while using an rxdsk volume in an md RAID.

Update – A fix for this has been addressed in version 2.4. The core issue was that the MD RAID 1 array was routinely sending an I/O command of 0 bytes which would cause a return value (initialized to EIO) to fall through the I/O handling routine and return back to the calling process with an error. Thanks go to Dmitry Trikoz of Stratus Technologies for finding this and providing a solution.


How much does it cost me to obtain and use RapidDisk?

The best part of this solution is that it is free. It will only cost you a bit of time to learn, test and configure the solution. And if you need help, I offer support and consultation services and nobody knows my code better than me. For more information, please visit the Support page.


Why should I use this instead of a traditional and persistent Flash Solid State Drive?

Aside from the fact that DRAM performs much better than NAND Flash memory, DRAM’s performance is also consistent. DRAM also does not have a limited cell life which limits its write operations. Flash memory will hit a breaking point once those cells have been written to and it enters its Programmable Erase (PE) cycles. This significantly affects write performance (by more than half at times) and to combat this, vendors resort to tricks such as over provisioning the NAND memory, wear leveling, write coalescing, etc. Although these methods also have an expiration date, that is, at some point, you will start to hit those PE cycles.

Although there is an even scarier concern with NAND technologies, the one thing the vendors of these NAND chips and in turn Flash drives are not divulging to their consumers / users. That is, the Read disturb. This can create data loss or data corruption and not all NAND controllers are equipped to avoid this.

The method used to read NAND flash memory can cause nearby cells in the same memory block to change over time (become programmed). This is known as read disturb. The threshold number of reads is generally in the hundreds of thousands of reads between intervening erase operations. If reading continually from one cell, that cell will not fail but rather one of the surrounding cells on a subsequent read.

One thing that the RapidDisk technologies can do is help alleviate read operations to Flash technology by redirecting all read operations to DRAM instead.


Community


Getting Involved

As I am an individual, trying to make a name for myself in the data storage industry, community involvement is strongly encouraged around my projects. I am committed to delivering open source solutions to the growing open source community and highly encourage code contributions and bug submissions which can be submitted in the project’s official bug tracker.

If you desire to contribute code, please send it in a patch with details to the version the patch was branched from and what bugs and/or feature enhancement(s) the patch intends to address.


Obtaining the Source Code

Navigate to the appropriate project from the project page. You will find directions to clone from a version controlled repository for that specific project. Verify that you have the appropriate version control application(s) installed. Currently I am using git. To clone the source code on a UNIX or Linux machine you will need to invoke the following on the command line:

$ git clone http://git.rapiddisk.org/rxdsk-2.x.git


Obtaining Documentation

Documentation pertaining to development specification(s) and details of this project will be maintained on this project site and also in the distribution of the project’s packages. If you feel that the documentation provided is not sufficient for use, please do not hesitate to submit a bug detailing what is needed in the project’s official bug tracker.


Submitting a Bug

As highlighted above more than once, to submit a bug, go to the project’s official bug tracker.


List of Contributors

I wanted to extend a thank you to all those who have contributed to the RapidDisk project by submitting patches for various things or by helping test code fixes. The list is as follows:

  • James Plummer of Stratus Technologies
  • Gustaf Ullberg
  • Dmitry Trikoz of Stratus Technologies
  • Neo
  • Efstathiou Efstathios
  • Marcel Huber
  • Steven Rudolph


Support


General Information

Included within each stable release are man pages for all the rxadm administration.


Bug submissions or feature requests

If you found a bug with anything relating to this project, or have a feature you may want to see implemented (note that not all feature requests will be accepted) or feel that certain information is missing from this project site, please do not hesitate to submit a bug reporting so in the project’s bug tracker.


Commercial Support

For commercial support, please visit www.Inverness-Data.com or contact support AT inverness-data DOT com.