Hardware Sizing
Essay by review • December 5, 2010 • Research Paper • 3,119 Words (13 Pages) • 1,473 Views
Basic Requirement statement:
* The Ripple Premium server acts as a repository of content for subscription by mobile users.
* Currently the content is collected through an Excel sheet based mechanism once a day and loaded into this server.
* The Excel sheet contains location and description data about content on different provider's servers. This Excel sheet has to be prepared by the Content providers and given to PurpleAce. Then through macros the data is loaded into PA's database for locating the content on demand.
* The aim is to automate this process.
* The indexing of newly added content at any content provider's site can be a scheduled process. This can run once a day as the current Excel based process runs, twice a day etc.
* There should be a way to identify already added content. One way to do this is to perform the identification at PA end. However, this leads to a lot of overhead which is best avoided. One way to achieve this is to have a file / database table at the original content provider's end which is used for checking already sent content before sending the metadata over the wire. If this is not feasible, then the content collection should run during non-working hours so as not to affect the network and do the checking at PA end.
* Ideally the XML is to be provided by the content provider, whether conforming to PA's generic standard or a format of the content provider.
* In the Design, it may be an idea to provide for both options based on some configuration value. If checking is possible at client side that is turned on, else it is turned on at PA side.
* We can make the content provider responsible for validity of data. However the following checks should be done before pumping meta-data and binary files to RP server - whether the resource indicated actually exists on the provider's m/c and whether the size is correct (ie above 0 kb).
* There is a need to look at the curent table structure being followed in PA for holding content information. If necessary this can be modified.
* A typical way to check for already loaded content is by provider name and content file name by forming a composite key.
* Another way is to have the XML data itself stored, either in a XML database, or as varchar format in the database. In this case, repetitive content can be filtered out.
* The content provider has to expose their metadata in a standard format so that PA can read it in a uniform fashion.
* Large content providers would have their own metadata standards and they may not modify this for the sake of PA.
* The three content providers/ aggregators which were checkied out were: Walt Disney, Rediff and Yahoo!.
* The approach can be to have a specific PA content metadata standard built in based on similar standards like RSS. Parallely connectors for different large content providers can also be built based on information received from them. Where possible PA's standards can be used.
* Based on likely major content providers, some connectors can be built to begin with. For example we may have a Rediff connector, a Walt Disney connector etc alongwith PA connector. For this specific information for content metadata standards for each of these providers needs to be collected.
Content Storage Approach:
* Currently the content files (wallpapers etc) as well as the metadata about these are stored in the PA server. Each content file is stored with 2 XML files and also an entry is made into a table.
* With addition of more content, this would require more space on PA servers.
* The preferred approach can be to have the PA database indexed with meta information about each content. The actual content file would repose on the original content provider server and would be pulled out and shown when there is a request for it.
* Before storage in PA database and content archives, we need to check whether the same content has been stored before. It is not possible to check every bit of content for this. Instead, when we parse the XML file generated by the content provider we can store its date-time stamp in a table. Subsequently, when we receive the next feed, we can compare the date-time of the new file with that of the old file, and decide accordingly. This would guarantee an amount of check.
* PG database does not have the concept of XML DB as in Oracle. Hence while raw XML can be stored in PG database, it would not be possible to update this SQL if required very easily. If Oracle is used then the XML metadata can be stored for each content file in the XML DB and this can be updated if necessary.
* A uniform design may be to drop XML storage and have individual field storage using Oracle's native datatypes, based on the metadata XML obtained from the provider.
* Archival is a need to be addressed. The archival process can be implemented in the following manner:
When a user requests a content, it is pulled from the original provider's server and saved in the PA server. A index is created to identify the user-content relation. It should be possible to avoid storing the same file multiple times for different users.
Also, if a content is that has been archived already for user A is requested by user B, then instead of pulling it from the remote server it can be served from PA server.
Problem with this approach: How to create a preview of new content without pulling the files from the remote server?
Decision: Content and icons would be pulled alongwith the metadata at the same time to solve this problem.
Assumption:
The content provider would make a XML feed available on his server such that it can be accessed by PA's agent whether it runs on PA side or on provider side.
PA can either deploy a small scheduler application on the Content provider side to read the content and provide the metadata XML feed, or it would be the responsibility of the provider to do so. It is expected that for large providers the XML would be made available by them, whereas for smaller providers PA may have to deploy an agent.
Provisioning XML schema for PurpleAce:
The current metadata being used for each image is::
(Format
...
...