As an addendum to my previous post. There are several tables in...

  1. 1,895 Posts.
    lightbulb Created with Sketch. 1
    As an addendum to my previous post. There are several tables in the db of 10+ million records. In a fully normalised db several of these tables are redundant as they could be calculated on the fly each time. I chose speed over compactness of the data. To give an idea of size, the raw tables occupy close to 1Gb of space. Dumped to backup files and compressed it still occupies 300+Mb. At these sizes, excel ran out of rows before you get 10% of the way through the data. Early versions of the dataset sat on MS Access, which according to the specifications goes to 2billion odd records. In practice it falls apart between 10,000 to 100,000 records depending how big each record is. With care an MS access table can get to 1 million records but don't let a complex query near the data or run it over a network.

    It really comes back to knowing your data and tools, especially what the tools strengths and weaknesses are.

    I did not sit down and write a specification for what I have. It grew organically. I started collecting data. Analysing it. Then started asking more questions about the data. Then collected more data. All the while adding more data in time. I've applied this to my professional work and to developing my shares analysis. As the analysis proceeds, you think of more things that you would like to do if you have the data. You then go and search for data sources and collection methods.

    As far as the shares go, I've found a lot of data out "there" and free. Much of the data is in pretty presentation format which needs processing back to raw data suitable for loading, whether into a db as I do it or into a spreadsheet or any other package. This pushes your own learning skills in searching out data and then the skills in processing the data. In my case the data collection sits on a linux box with a series of scripts to collect and process the data with all the results emailed to me ready each morning. The process started out hard coded for every collector. It slowly grew and now many of the collectors are generic and driven by their own table in the db. It looks great and elegant but it has grown over 15+years and looks nothing like what I started out with.

    DKit
 
arrow-down-2 Created with Sketch. arrow-down-2 Created with Sketch.