A guy at work once remarked to me that he thought relational databases were a serious failure and have resulted in loads more pointless work for developers; a new job title in "DBA" and basically a whole load of wasted money. I didn't agree at the time, but having thought about it since, I think he's right. Dr. Codd invented relational databases and SQL in 1970. The theory was that you wrote SQL, and the computer optimised this and worked out how to get the answer to your question most efficiently. This didn't work well then, but Dr. Codd figured that with computers increasing in speed all the time, it wouldn't be too long before this was possible.
However, it's now 2007, and what do we see:
- People still have to optimise queries after 37 years of growth in computing power. That's when computers are 16 million times more powerful than they were in 1970 (Moore's law anyway)
- They have to collaborate with DBAs who will add in indices to support commonly used queries. Optimising SQL and indexes etc. is what is usually referred to as a black art. It's not amazingly tricky, but basically very few people know how to do it
- Databases' own ability to optimise queries is remarkably poor. They always advertise their super clever optimisers, and I'm sure that's the case, but they really don't cut it (no great surprises; it's an optimisation problem with no well-defined solution)
- What they are good for is management information-style queries; however, in 2007, people aren't using them for that, as you'll hold too many locks for too long and impact performance of an online database, plus queries take too long, so people are hiving them off to OLAP cubes or the like and doing the MI stuff somewhere else
So I think the guy was pretty much right. Relational databases don't really suit what they're currently used for.
But what do YOU think?
― Keith, Friday, 6 July 2007 22:35 (twelve years ago) link
― James Redd and the Blecchs, Monday, 9 July 2007 05:55 (twelve years ago) link
In my experience people treat this problem like there are only two possible solutions - running text manipulation and regex stuff against massive flat files, or SQL against massive tables. Funnily enough it seems like it's easier to scale and optimize the former, given a limited set of inputs, but whenever people want to do more than two or three things with the data they turn to SQL. What else is there? What's better than a RDB that lets me do lots of different manipulations without building a new hash table or whatever for every request?
I don't know enough how this stuff works.
― El Tomboto, Monday, 9 July 2007 08:30 (twelve years ago) link
Tom, I don't think it's because other practical alternatives do exist (some might argue that), but for sure SQL is by far the most common. I guess my argument is whether or not it has succeeded in its goals, which I don't think it has.
It's certainly easier to optimise the former, though, because you've full control over the what's going on. If an RDBMS doesn't do it the way you want it, there's quite often nothing you can do about it, though you are trading off making the data accessible to others (amongst other things) by building a proprietary format.
From a commercial point of view there's no argument, SQL and relational databases make absolute sense, as you can generally find people who know how to use them. If you go looking for Objectstore experts, you're not going to find them, and even if your top techies say they understand it, it's probably a recipe for a failed project.
However, imagine an alternative which looks something like an OO database. Suddenly all this pissing about with indexes and query optimisation is out the window, because 99% of 'online' style queries look pretty much like tree traversal, which is extremely quick (and well-defined). For the remaining 1%, when the data is written, write it both to the online OO database and to something that copes with offline management info style queries (like an RDBMS or whatever). Could this work better? As with everything, there are tradeoffs being played off against each other here, but I guess we'll never know because the cost of data migration is so high (and so heavily loaded with risk) that people just don't do it.
― Keith, Monday, 9 July 2007 17:18 (twelve years ago) link
the dba here is the slowest typer i have ever met. he will also cut and paste single words ('select') from the line above. i can feel myself age whenever i have to sit with him to do something.
wrote the search engine here at work using oracle Text indexes etc. only everything was designed and written using small examples. we had someone searching for something the other day (music artists, releases and tracks) which returned >30000 rows (none of which was an exact match for what they wanted) and took >12 minutes. oh um.
(the problem being that if they search for something that doesn't exist but contains a very common word, 'i' in this case, the search eventually gets around to returning everything containing the word 'i'. string together a query containing lots of common words and one non-existant one and i'm sure i could get that up to 200,000 results)
― koogs, Friday, 13 July 2007 16:37 (twelve years ago) link
Certainly, the ILX fulltext search will drop common words like "I" from the search to prevent precisely this issue.
― Keith, Saturday, 14 July 2007 18:04 (twelve years ago) link
SQL is absolute rubbish. It's too slow, it's too old, and it's way too tedious running query after query trying to tell the computer exactly what you want it to do. That's why DBAs make so much money. They're the only ones smart enough to know how to work with this piece of horseshit!
But there's really no alternative to it, is there? Somebody should make one.
― Mr. Snrub, Thursday, 19 July 2007 04:56 (twelve years ago) link
the above isn't, unfortunately, full text search so i can't just drop things (although AND and THE are dropped). it's artist names, release names or track titles. english and swedish (although we do nothing with the swedish equivalents of THE or AND (which are different depending on gender and plurality = can of worms))
i like sql but, yes, it is a bit of a black art getting what you want out of it. i hate the bits that all dbs add to it to do the non-standard stuff like sequences, oracle being the worst. (dba is a slightly different job from db programmer though, much the same way the bloke who looks after the linux machines here couldn't write 'hello world' (ok, maybe in bash))
― koogs, Thursday, 19 July 2007 10:11 (twelve years ago) link
SQL pwned me yesterday when I wanted to extract samples taken at 6:00:00, 6:10:00, 6:20:00 from a dataset that had samples logged every second, starting at 5:55:15. I'm sure it can be done, just wasn't smart enough to get it sorted in the timeframe when I needed it.
― Jaq, Thursday, 19 July 2007 13:35 (twelve years ago) link
where to_char(date, 'HH24:mm:ss') like '06:_0:00'
― koogs, Thursday, 19 July 2007 16:31 (twelve years ago) link
Koogs! Type safety!
I mean that would work, but in principle at least, it's scarily unsafe! Plus it's a tablespace scan.
― Keith, Thursday, 19 July 2007 17:01 (twelve years ago) link
quick and dirty, my middle name. s.
― koogs, Thursday, 19 July 2007 18:48 (twelve years ago) link
Should have been more explicit - it's Access, so no to_char. I thought it could be as simple as WHERE time LIKE '__:_0:00 __', but no. Not even converting the field to text first (from Date/Time) let that work. I was also thinking I could convert the field to a pure number (thinking I'd get the UCT and could work some math magic on it), but that throws errors.
― Jaq, Thursday, 19 July 2007 22:35 (twelve years ago) link
I would read the primary key and the time in, get them all and sort them out in program code, rather than trying to do it in SQL. If the table's too big, then you could write a loop and get every x minutes' worth of data and pick out the rows you want.
Once you've got them, execute a big IN (...) query to get the rows you want.
You could avoid that step if you've enough memory/time to get the whole rows at a time, I guess.
You're right, it's a messy problem!
― Keith, Thursday, 19 July 2007 22:47 (twelve years ago) link
i guess my solution is a function of a) being self taught (and with an idiot for a teacher) and b) using vi or grep for editing / sorting needs and, consequently, seeing everything as a string.
getting away from Jaq's specific problem for the minute:
would it be better if you cut down the result space first, like (pseudocode alert!)
where time.hour = 6 and time.seconds = 00 and to_char(time, 'mm') like '_0'
(in the same way "make && make install" or "if (str == null || str.trim().length() == 0)" (ie there's an implicit AND or OR in there and the second part only executes if the first part passes / fails) or is what gets executed a whim of the oracle developers?
or would i be better off going the whole hog and doing it all as numbers:
where time.hour = 6 and time.seconds = 00 and (time.minutes % 10) = 0
returning to Jaq's problem, DatePart()?
― koogs, Friday, 20 July 2007 09:15 (twelve years ago) link
I could see that working, using DatePart() to return the minute and second. WHERE (((DatePart("n",FI15B_TI15D.TIME)) Like '_0')) AND (((DatePart("s",FI15B_TI15D.TIME)) = '00')) doesn't throw errors but also doesn't return anything. Running it with just the seconds clause or just the minutes clause also don't return anything. I'll keep poking at it though.
This is a dataset extracted from our running system as .csv files, btw. About 35k records in a purpose-built .mdb so I can hack away at it w/ impunity.
There were hard brackets around the tablenames, btw, but the BBCode parser got all choked up on them.
― Jaq, Friday, 20 July 2007 15:32 (twelve years ago) link
put the datepart()s in the select part and take them out of the where part and see what they are returning, probably isn't what you expect.
― koogs, Sunday, 22 July 2007 20:48 (twelve years ago) link
True. I bailed on it though and wrote up a function in my favorite environment (which handles .dbfs natively, and very nicely) instead.
― Jaq, Sunday, 22 July 2007 22:28 (twelve years ago) link
ha ha. just been sent some optimised versions of the queries i'd written:
he optimised out an entire join against the 'artist' table. which was nice of him. except the query was meant to be returning a list of artist objects to fill up the Artist ejb things.
(he also removed a left join which meant we had to populate the previously left joined table in order to get the required rows returned - 413,000 mostly empty rows. per service (of which we have 4 and counting.)
― koogs, Monday, 23 July 2007 17:37 (twelve years ago) link
― Jaq, Wednesday, 1 August 2007 15:49 (twelve years ago) link
How weird.. Eveything I've ever seen in the past 10 years or so has suggested Oracle #1 and DB2 #2... Not that that's anything but disheartening too.
― Keith, Wednesday, 1 August 2007 15:52 (twelve years ago) link
I'm wondering how the question was framed. Also, not even 700 people asked = this is a very very very small sample.
― Jaq, Wednesday, 1 August 2007 17:03 (twelve years ago) link
Thing is, my enterprise uses it, but the strategy is DB2 and Oracle; SQL Server is tolerated for packages, but not in-house applications. By that logic, we would have voted yes, to this study, when in fact, it's accounts for about 0.5% of usage as compared with DB2.
― Keith, Wednesday, 1 August 2007 18:13 (twelve years ago) link
― stet, Monday, 3 September 2007 04:00 (twelve years ago) link
oh shit son
― Catsupppppppppppppp dude 茄蕃, Friday, 14 September 2007 21:57 (twelve years ago) link
― am0n, Thursday, 20 September 2007 03:38 (twelve years ago) link