Postgres – Pre-conference

I’m heading off for a week-long vacation out east tomorrow, and then after that I will be attending my first developer conference – the Postgresvision conference in Boston from June 27-28, so I won’t be doing much in the way of updates for the next two weeks. I didn’t make it all the way through the postgres docs, but I got the bulk of it, and it was a nice mix of refreshing my existing knowledge, clarifying some concepts I was fairly familiar with, and getting an overview of some concepts that were mostly new to me. I’m hoping to eventually get through the rest of the docs, maybe throughout some downtime during the conference.

After the conference I will write some posts about some of the lectures and workshops that I attend, and then I will hopefully be focusing on 3 areas for the month of July:

  • Finishing up some more tests and making minor improvements on quizipedia
  • Finishing up the postgres docs
  • Looking for an open-source project to get involved in
Advertisements

Postgres: Text Searches

So I’ve found what is easily the most interesting part of sql I’ve learned about so far: The full text search. Basically it allows you to do “intelligent (more-intelligent-than-brute-force-checking-using-the-LIKE-operator)” term searching against a longer document. The two main data types involved are the tsvector and the tsquery, which essentially represent a ‘normalized’ document and search, respectively. A tsquery contains the search query, which is searched against the tsvector.

The document to search must first be converted to the tsvector, which involves searching through the document, parsing everything into tokens (numbers, words, hyphenated words, etc). This is done by a parser. These tokens are then passed through a series of dictionaries, which attempt to find a definition for the tokens being passed in. This includes stripping a word down to a root word, and obtaining a match off that.

For example, a simple doc may contain the text: “A banker visited the Moon-Base”. The tokens produced by the parser may be something like “A”, “banker”, “visited”, “the”, “Moon-Base”. Then the dictionaries will likely resolve “banker” to the root “bank”, and it may split up “Moon-Base” into “moon” and “base”. It would also simply remove the stop words –  common words that are likely of no use (“A”, “the”…). The result of the dictionary is a set of lexemes, which just mean a normalized token. These are stored in the tsvector.

Then a query can be run against the vector to see if it matches. The tsquery can take in the standard operators (NOT, AND, OR). So something like ‘(moon | sun) & (visiting) & !gibberish’ might match, but ‘(star && moon)’ would not. There is also a ‘FOLLOWED BY’ operator (<->), to check if one word follows another. So the query ‘base <-> moon’ would not match, but ‘moon <-> base’ would.

This is just a very simple example and explanation of what it’s capable of. The best part is that this is very applicable to my Quizipedia application, and some of the terms I’ve learned about are even things that I’ve been implementing on a small-scale on my own. For example, I’ve created a list of “common words” that automatically get ignored by the relevant word algorithm. This is essentially what the dictionaries do when they catch stop words. So I will definitely be exploring this topic a lot more, and hopefully applying it to my current and future projects.

Postgres: Indexes

Today I got through the postgres chapter on indexes. I actually thought there would be a lot more to indexes, but it turns out I’m already fairly familiar with the important details. There are some index implementations of postgres that I was totally unfamiliar with (GiST, GIN, BRIN), but the B-tree is easily the most important implementation as far as I’m currently concerned.

There were a few enlightening lessons, such as the important of ordering your indexes properly, and the concept of partial indexes, but for the most part this ended up being a formalization and review.

Postgres: Functions, Operators, Conversion

Today I got through the functions, operators, and conversion sections of the postgres docs. There are lots of different functions in postgres and there is a lot of overloading on operators and functions, so this was just another overview of the possibilities. A thorough understanding of these concepts will require a lot more experience. So now, if someone asked me to name all of the functions that are associated with the bytea data type, I would still be fairly lost. BUT, my goal in doing this, is to get a general idea of the scope of problems that postgres is capable of solving. So if I arrive at some implementation problem where I need to make a scheduling and pricing system for a product, I know that postgres is ABLE to implement a money data type AND it can handle datetimes, including intervals and timezones. Postgres may not turn out to be the optimal solution for a given problem, but at least it’s another option now.

Postgres: Data types

Today I (mostly) trudged through the documents on postgres data types. As with some of the other docs, I have some familiarity with most of the concepts, and reading through the docs is more about completeness and making sure I haven’t missed anything. I got a better understanding of the differences between all the different number types, and from my experience, the important ones are simply integer, numeric, and serial. I’ve come across some of the more complicated ones such as uuid and point before, and I learned some new ones such as money and network addresses.

One of my big discoveries was the fact that you can specify a timestamp with a time zone. The time zone is something I’ve run into issues with before, and which likely could’ve been solved if I knew about these, but better late than never! I’m still a bit curious about the datetime identifier ‘allballs’, which is equivalent to 00:00:00:000 UTC.

I will likely be coming back to the textsearch types and the jsonb types. I have some experience implementing some text searches using ElasticSearch, but some of these docs are hinting that I may be able to do some of the same functionality in Postgres, which would be BIG deal for me. Further down in the docs I see a whole section on text searches, so that’s something to look forward to.

Postgres: Digging into queries

I got through some more postgres documentation today. I think reading docs can only hold my concentration for 1-2 hours max. Today I learned some formal definitions of some postgres functionality that I’m familiar with, such as joins, groupings, and the select list. Some new concepts for me were the GROUPING SETS, the ROLLUP, and the CUBE. These are essentially shortcuts to do complex groupings. Grouping sets allows you to do display multiple groupings within the same result set, a rollup lets you iteratively remove a column from a grouping set, and a cube takes the power set groupings from a group set. Off the top of my head, I can’t think of anywhere that these would’ve been useful for me in the past, but at least I know they exist.

The WITH clause, including the RECURSIVE option, are terms I’ve seen before, but have always been a little intimidated by, since they aren’t as intuitive as the simpler statements. I got a brief overview of them, and it seems like the WITH is something I could make use of (not so much the RECURSIVE), but I will definitely need some hands-on practice before feeling confident with it.

Postgres: Moving beyond the basics

I’ve continued reading through the postgres docs. The first little while was slow since it was mostly review and a lot of syntax and semantics. Right now I’m looking for a broad overview so I can be familiar with the functionality and workings of postgres at a mid-to-high level, so I’m not too focused on syntax specifics. I think that will come with more use. Today I got into some of the deeper concepts, including inheritance, partitioning, and schemas. I’ve heard about these and have a general idea of what they do, but reading these gives a bit more detail on HOW they work. I still would prefer to have a documentation that includes some exercises at the end of each document. I may look into some online courses to this end. In the meantime, I found a set of simple exercises to give me some hands-on practice: https://pgexercises.com/gettingstarted.html

I’ve also found a few ideas for how I can improve existing postgres databases that I’m currently working with. Some of the improvements might include stronger constraints, adding schemas for users, and adding permissions for certain queries (e.g. read-only data).

Postgres Intro

Today I started reading through the postgres docs. I just got through the Preface and the intro tutorial. It was mostly review, but I wanted to get through it anyways just to make sure there weren’t any important or basic concepts I’ve missed. So far there are no real surprises, and it was nice a refresher on some familiar terms, but I think I’d like to find some more practice on certain concepts, such as inheritance and window functions. I’m mainly used to using a relational database for simple queries (Update, Select, Delete) and the occasional join, and then I let the procedural logic in another application layer handle the logic that might be handled by some of these techniques. But I’d like to learn them anyways, just to have some more tools, in case I ever need them. I’m also going to try and find some simple postgres challenges, similar to programming katas, like I’ve tried on Codewars.

 

June Goal: PostgreSQL

At the end of June I am going to be heading to Boston for the postgresVision conference. It will be my first tech conference, and while I have a fairly good foundation in postgres, I’d still like to learn a bit more before going there. So for the month of June, I am going to be focusing on becoming an expert in postgres!

My main resource will be the official documentation found here, but I am also going to try and apply as much as possible to my postgres implementation in the quizipedia application (and of course at work). So I’ll be posting some summaries of what I learn throughout the month. After the conference, my tentative plan is to switch back to some of the larger scale goals in quizipedia, but this month will be a well-earned context-switch.