CRAIG KERSTIENS

Getting comfortable with psql

Mon, 11 Nov 2024 07:57:56 -0800

psql is a CLI editor that ships with Postgres. It’s incredibly powerful for working with Postgres, and doesn’t take too much of a learning curve to start to get comfortable so you can really feel like an expert working with your database.

Just a rundown of a few things to get you started:

Once connected in psql you can get an idea of all utility commands available with:

\?

A handy thing I use all the time is \d.

\d will describe the tables within database. You can also add a table/index/etc. onto it to describe that specific table such as:

\d accounts

There are a number of options you can set in your psqlrc (config) file to customize your CLI experience. But you can also toggle those when directly working in psql.

\timing will give you the time it took to run your query
\x auto will autoformat your text output
\pset pager 0 turns off your pager or 1 to turn it back on

Oh and for editing a query in your editor of choice. Make sure you set your $EDITOR enviroment variable to the editor of your choice, though the only right choice is vim:

\e

Just a few things to get you started working with psql.

Your browser does not support video

The future of Postgres?

Fri, 18 Oct 2024 08:50:56 -0800

I’m often asked what do I think the future for Postgres holds, and my answer has been mostly the same for probably 8 years now, maybe even longer. You see for Postgres itself stability and reliability is core. So where does the new stuff come from if it’s not in the stable core… extensions.

Extensions within Postgres are unlike most other databases allowing you to modify or well extend the standard Postgres behavior. You can build other storage backends, new types, etc. Postgres itself ships with a number of extensions within the “contrib”. The list of contrib extensions hasn’t changed any time recently, but even contrib is a small sampling of what is possible. Beyond core there is a whole world of extensions, I want dig into just a smalls sampling starting with a few in core…

pg_stat_statements is to me the most useful extension that exists. It records what queries were run, how long they took, and a number of other details about the queries. A key extension for managing performance of your database.

auto_explain another one in contrib that is helpful for performance. For queries that run over a certain period of time will automatically log the explain plan–helpful for performance debugging.

pg_prewarm useful to prewarming the cache ahead of a failover.

Let’s jump out of contrib a little bit now. Of course that isn’t everything that ships with Postgres there is more, explore for yourself.

Citus is one of the (to date) more advanced extensions ever created. Citus turns postgres into a sharded, distributed, horizontally scalable database. Citus is especially built to work well for B2B style multi-tenant apps, and now after being acquired years ago part of Microsoft.

Pg_search extends Postgres to support elastic-quality full text search directly within Postgres. I often say Postgres can do just about everything and be pretty capable. Things like time series and search it’s about 80% as good of some of the best in class options out there, but pg_search takes it further making it a full competitor to elastic, but Postgres.

Those are some bigger ones, but you’ve also got a lot of small focused ones as well.

Pg_cron is incredibly handy. Created originally by @marcoslot while at Citus, it’s a small extension that does what it sounds like run scheduled jobs within Postgres. It is now a standard across all the major cloud providers. We leverage it heavily at @crunchydata, but went a little further and built a UI on top so you don’t have to worry about crontab styling of the jobs.

Just yesterday at Crunchy Data we released pg_parquet. There have been some questions about what about other extensions that sort of do similar but not exact but also do a lot of other things. This allows you to seamless copy to and from Parquet files with Postgres. It follows the linux philosophy of small sharp tools. Marco described the why extremely well:

We wanted to create a light-weight implementation of Parquet that does not pull a multi-threaded library into every postgres process. When you get to more complex features, a lot of questions around trade-offs, user experience, and deployment model start appearing.

There is a whole lot of extensions out there. There are big robust ones that have been around forever (yeah I’m talking about you PostGIS), there are ones in contrib, there are ones that are literally created as jokes. Some try to do a lot others as I mentioned very narrow focus like pg_cron. Regardless of whether you believe in large complex extensions that do a lot or small ones that do one thing but do it really well, if you want to know where the future of Postgres is I can’t imagine a world where extensions do not play a heavy roll.

What problem are we trying to solve?

Mon, 28 Nov 2022 11:23:56 -0800

If you want to seem like the smartest person in the room, wait for a break in conversation, after sitting quiet for 15 minutes, and ask “What problem are we trying to solve here?” It works every time.

We’ve all been there. You walk into a meeting, there are 10 people in it. It gets rolling, different folks chiming in. A few people seem to be taking personal notes, they always do. There is the person that scheduled the meeting, but they’re not the person that usually makes decisions. It’s primarily going back and forth between 3 individuals with 7 others watching on. You’re 15 minutes in, and while there has been lively discussion already… you find yourself back on the original point from minute 1 seemingly lost 14 minutes and you’re unsure to what.

A healthier environment ideally has an agenda sent out ahead of time. Based on the agenda there is a clear structure planned for the meeting which presumably points to a clear goal. Even without an agenda an alternative structure would be a clearly stated goal in the invite. Within the meeting you SCQA-it up live. Huge bonus points if the meeting invite includes a link to the notes doc where meeting notes will be transcribed by an explicit scribe for the meeting, sent out after for folks to agree/confirm as a good record of the meeting.

As a facilitator as good as you may be, you’re still going to end up in the first situation. Every time wait 10-15 minutes, then ask the question. It seems to be more effective after you’ve had the detour vs. leading off with it in the first 1 minute. Even better internalize the question to yourself vs. just making yourself look good by asking it (which will happen).

Gonna use a personal trick/hack? Use it often

Tue, 22 Nov 2022 08:11:56 -0800

What do I mean by trick/hack? This can be super flexible, but I’ll toss out a few of my own personal ones.

One of the rules of my product management philosophy, is if you’re going to use a tool or trick then use it often. This is in fact one of my favorite interview questions for PMs. If you ever interview with me, be prepared it’s coming. It applies for engineering managers and others in leadership roles as well, but I’ve found especially get for PMs.

Getting a response from team emails

It’s very common to have a team@ alias within your company, whether for the whole team or for smaller teams. As a PM you may need to email sales@ and get feedback of requested features, or feedback on the roadmap. Chances are you’ll send some good time and effort crafting this email, being concise but also giving enough context for them to give you an informed response. If you’re good you’ve fed the email through hemingway app before you sent it and gotten it down to grade level 7 or lower. You send it middle on a Tuesday around 10am so it’s not lost in Monday triage, and not missed on Thursday that aligns to a long holiday weekend. And then… silence. Nothing. Nada. Zip. Zilch.

I’m not really sure the reason for crickets, perhaps at best they assumed a bunch of their peers got back to you privately. At worst they’re apathetic and care strictly on closing their next deal and not giving feedback that helps long term growth. But I’m not here to dissect why. How do you get a response?

Slightly restructure the email to instead of going to the sales@ alias, be targeted to individuals. Pull the emails of everyone on the list, make the opening a generic “Hi,”, and have it go directly to the individuals via a mail merge.

From experience have had CEOs (who I forgot to omit) email me back when they were out on vacation apologizing for slow response (of hours) and that they’d get me feedback before weeks end.

Turn a group email into a love fest

As much as 1:1 emails are great and will solicit a response. Often you need to send out an announcement to a team. “We’ve got a big announcement tomorrow”. You’ll put the same work and effort into that team email as you did in the above, and again the usual response is crickets.

My teams know I do this, and yet it still works. Get 1-2 other leaders (not necessarily management) but morale leaders to know it’s coming and chime in on a reply-all to the team@ list with positive comments and praise. The same applies also to new hire announcements that goes to team@. You’re going to end up with folks not wanting to be left out from piling on. It’ll start to get flow of feedback public and private coming in about what is good on the communication and give you a sense of where it can improve.

What are your hacks?

These are just a few, as I keep writing up some longer form of my personal product philosophies will layer in more of these nuggets. But every good PM I’ve ever worked with has had their own bag of tricks. Again it’s one of my favorite interview questions, I’ve stolen many great tricks from others over the years. But curious, even without needing for us to be in an interview what are yours? @craigkerstiens

Unfinished Business with Postgres

Wed, 18 May 2022 08:52:56 -0800

7 years ago I left Heroku. Heroku has had a lot of discussion over the past weeks about its demise, whether it was a success or failure, and the current state. Much of this was prompted by the recent, and on-going security incident, but as others have pointed out the product has been frozen in time for some years now.

I’m not here to rehash the many debates of what is the next Heroku, or whether it was a success or failure, or how it could have been different. Heroku is still a gold standard of developer experience and often used in pitches as Heroku for X. There were many that tried to imitate Heroku for years and failed. Heroku generates sizable revenue to this day. Without Heroku we’d all be in a worse place from a developer experience perspective.

But I don’t want to talk about Heroku the PaaS. Instead I want share a little of my story and some of the story of Heroku Postgres (DoD - Department of Data as we were internally known). I was at Heroku in a lot of product roles over the course of 5 yrs, but most of my time was with that DoD team. When I left Heroku it was a team of about 8 engineers running and managing over 1.5m Postgres databases–a one in a million problem was once a week, we engineered a system that allowed us to scale without requiring a 50 person ops team just for databases

This will be a bit of a personal journey, but also hopefully give some insights into what the vision was and hopefully a bit of what possibilities are for Postgres services in the future.

I wasn’t originally hired to work on anything related to Postgres. As an early PM I first worked on billing, then later on core languages and launching the Python support for Heroku. It was a few months in when I found myself having conversations with many of the internal engineers about Postgres. “Why aren’t you using hstore?”, “Transactional DDL to rollback transactions is absolutely huge!”, “Concurrent index creation runs in the background while not holding a lock, this should always be how you add an index.” Now we had some great engineers, but it was the typical engineer that interacted through ActiveRecord and didn’t want to think about the database.

As I found myself evangelizing Postgres, suddenly I was being recruited by the lead of the Postgres team to come and do marketing, I didn’t know anything about marketing and thought they were joking. A couple months later found myself doing PM and marketing for DoD.

Why did Heroku pick Postgres?

But, I’m getting a little bit ahead of myself. How did Heroku even start doing Postgres or why? Running a PaaS (platform as a service) is a lot of work, running a database is a lot of work. In some sense doing both is splitting your focus. And I’m increasingly coming to the conclusion that platform companies will do best to focus on their platform and data companies will do best to focus on the data. Well, way back in the day we had all these Rails developers asking for a database and we thought how hard could it be? (It was more work than we expected). So we’re gonna run a database, the question becomes which one? Most folks didn’t have strong opinion, but one of our security/ops engineers chimed in “Postgres has always had a good record of being safe and reliable for data, I’d go with it.”

And with that, we were building and launching Heroku Postgres.

The first version of Heroku Postgres, no automation, no self service provisioning, you opened a ticket and we’d correspond and ask when you wanted us to set it up for you.

Before Heroku Postgres was Postgres

The very first version before we committed to Heroku Postgres was internally known as Shen. The model was much more akin to shared hosting that was common for that time and place. Within a single instance we’d pack in a lot of Postgres databases, simply running createdb, creating a user for you, then giving you access to that DB. This worked fine for those just kicking the tires and building small apps, but despite telling them to not use it for production people continued to.

While Heroku had Postgres before Heroku Postgres it became a project and its own orchestration layer for databases as a more first class citizen around 2010. The initial codename for the project was “bifrost”.

The original design

Heroku Postgres was designed to have a central FSM (finite state machine) that would orchestrate the databases. This design pattern to my knowledge came from @PvH’s work and appreciation of video game system design. It felt like a novel approach to the software being built at this time. The fact that it is a more common design patten now shows what a great design it was, and how ahead of its time the level that team was building at.

It was a basic Ruby application that would go out and check the state databases and go through the needed steps when interacting with either Postgres or AWS APIs. AWS APIs back then were not what they are now and this allowed to build in necessary retries, redundancies, and quality of service.

Sometimes you’re good, sometimes you’re lucky

Sometimes you’re good, sometimes you’re lucky, sometimes it’s both. Over time we built out more reliable provisioning and monitoring of databases. In early 2011 we felt a need to continue improving this. It was the early days of EC2 and reliability wasn’t the strongest spot, instances could go offline.

Per @danfarina “We were thinking about working on replication but skipping over archiving (by more carefully managing state between servers, e.g. by directly moving things through rsync, which was/is still pretty normed postgres stuff predating pg_basebackup) but then one of the shared databases (shen) had a near miss when a disk was lost that caused a rather horrific amount of effort and some nailbiting in restoring from pgbackups.”

“We then decided to take previouly rejected approach of building everything up on the archives. the first versions were s3cmd based, and were something of a prototype, upon request from PvH to ship something more raw, but more quickly. We had just got early versions into production when the major disk apocalypse hit in April 2011, though it was in something of an evaluation period and it was not yet a well-exercised & monitored program, so we crossed our fingers and, thankfully, it worked on every database, once AWS had capacity to spare.”

At the end of the event it was communicated eventually that if we wanted AWS could give us all of the EBS disks back, but it could be all of them were corrupted. As someone we can’t recall adequately described it: “It’s like getting a gallon of ice cream, but it may have a turd in it”.

More from @danfarina: “Were it up to me at the time, I probably would have moved onto converting it to boto (one of the most mature AWS client bindings at the time, by far) before stopping to deploy the s3cmd near-prototype on every instance, but that would have been a disaster.”

Our applications were resilient because we could leverage multiple dynos and the routing layer available to our system. Databases are a little different, but how could we at scale give the features you most needed for a database:

No data loss
Improved availability

Number one was always a core charter for the team and we would prioritize this over features, and over uptime. Uptime mattered, features mattered, but as a data provider if we lost data we’d lose trust. Thus that quick prototype became wal-e, which then went on to power many of the future features of Heroku Postgres, and be used for many years, though now has been deemed obsolete in favor of more modern tooling such as pgBackRest. But for it’s time and place it was some good execution and some luck on timing.

Thinking about the entire experience

As Heroku sat at a central point of app deployment we actively tried to think about the experience end to end. This manifested in some of the small things we actively campaigned for and collaborated with the community members who could make these happen. A few key examples come to mind for this.

The first was DATABASE_URL. Some of this originated from the 12factor concepts, others in that having 5 environment variables to define what you were connecting too felt verbose and cumbersome. Why couldn’t Rails just use DATABASE_URL if it were defined. I don’t recall the specifics here, but suspect this was something we nudged @hone02 to help with.

The second was around some features of Postgres. At the time Postgres was going well, but most of the core community focused on performance or enterprise-y features. We were coming at it from a different angle with an audience of Rails developers. We were intentional and engaged with some of the Postgres consultancies that employed committers, with a general theme of how can we help contribute, while also advancing Postgres based on the knowledge we have from users. A few highlights here included:

Not just the DATABASE_URL on the Rails side, but also on the native libpq wire protocol. While we didn’t do the work here, we were spent notable time advocating and engaging around it.
pg_stat_statements in my opinion now the most valuable Postgres extension existed before, but was effectively unusable for most applications. Funding this work was foundational to make Postgres have more usable insights natively.
JSON/JSONB collaboration

Of note we later hired several contributors to largely focus their time on upstream Postgres itself.

Dataclips vs. the team

Throughout the history of Heroku Postgres various individuals made a series of bets. First it was @PvH pushing for Wal-E to get out the door as an MVP, which was absolutely the right call in retrospect. Perhaps the one that is most exciting to me and people least associate with Heroku is dataclips. Matt Soldo, had this idea of GitHub Gists for your database. But in the early days of Heroku as a PM, like many places as PM you didn’t have an ease to mandate engineers try a thing. You had to campaign and convince for a thing.

Soldo didn’t build up enough buy in from the team, but had done plenty of awesome things that it was worth letting him run with this. We were all wrong. Dataclips was built by an external consulting company as a separate standalone Heroku app. It was only live for weeks and suddenly it was powering all of our internal dashboards. A live query you could embed into google sheets, suddenly our internal KPI rails app was replaced by dataclips and a google sheet. We didn’t need looker or tableau or other fancy BI tools and this lasted into 100m in revenue for insights into the business. To this day dataclips is one of my favorite features of Heroku, and I look forward to making an experience like that but even better, thanks Soldo for not listening to the rest of us back then.

Names matter

For a long time databases haven’t been known for being user friendly. We wanted to pull from other paradigms as we were making key database tenets available to people. We looked heavily to git for inspiration around forking/following. Database terms were common as master/slave, but we knew we could do better. Ee wanted to give the user a sense of what they could do with it. Archiving the WAL every 16 MB or 60 seconds (whichever was less) became continuous protection. Forking, was a snapshot as of some point in time. Follower, something that followers a leader node (a read-replica). I still recall an hour long analyst call with Redmonk with @monkchips and @sogrady–it was mostly wind them up and let them go (for the record @monkchips didn’t love fork/follow, I think he may have come around now).

This started even earlier than my time with being intentional about Postgres vs. PostgreSQL. But I’ve covered that one before, and you can even see it in some of the other lobbying around libpq.

Peacetime vs. Wartime

Things were rolling along well, we were shipping new features. We’d added dataclips, fork/follow were in existence for a while–and then we got a note from Amazon they were launching Postgres support for RDS at the next ReInvent. I was in person at that ReInvent and I’d never seen a roaring standing applause like that at a tech conference before when the moment was announced. In private channels we heard notes that this was because of us, the excitement and demand for Postgres became too clear for them to ignore and they had to add support.

We felt vindicated in our choice of Postgres and in what we’d built. But we also knew that now we had competition, running a database as a service on another infrastructure provider how can you compete? Well from some years of experience now I can say there are definitely ways and am confident that sharp narrow focus allows you to build amazing products which can be harder to do inside a large giant corporation. It was at one of allballs (UTC 00:00:00-when the data team would do happy hour on Fridays) that we were drinking beers and discussing how now we really had to focus. @PvH and @danfarina were discussing me personally as a leader–apparently I’m okay in peacetime when things are good but in wartime they were willing to bet on me.

Two weeks later I walked into the exec team meeting with a 2-pager assessment of what may happen, how we could compete, including 3 potential acquisition targets that could allow us to have a more differentiated offering. Within a few months we made one of those acquisitions which later became Heroku Connect. It made a lot of sense for many reasons, including Adam Gross was an angel investor in Heroku and knew it well and had helped build Salesforce Platform in early years. That wasn’t the end, but was just the beginning of how we could actively compete vs. simply being “hosted Postgres” vs. a more fully managed experience.

Metrics and Monitoring that almost was

The next vision and goal for Heroku Postgres was to continue to give better ease of use and insights into your database. Postgres itself already has a ton of awesome data inside it about how it’s been used. The catalog tables and extensions like pg_stat_statements have a wealth of information, but querying it looks like 200 lines of black magic SQL. @leinweber was perhaps one of the first, and the best at the team and quickly making something usable for people. The first step on our metrics journey was him making these Postgres insights trivially accessible via pg-extras.

Continuing on the journey internal foundational systems were built, in fact we just spent 3 months @crunchydata building similar systems that were spiritually aligned, to focus on collection of various meta data from systems and the ability to notify and communicate. I’m blanking on some of the systems, though some were obvious–observatory (I’m not sure if still in use or not) would observe databases.

These systems started to house a lot of information that then powered pieces within your Heroku Postgres dashboard. Things like slow queries and high IO or CPU load would give good insights when you logged in. The eventual goal was to connect the dots through proactive notifications. It’s one thing to get an alert from pagerduty that things are off, login to a dashboard and fix it. But what if in the early signs of things starting to go south we emailed you that you’ve got an increasing number of sequential scans that are starting to put you at risk of IOPS saturation, and because we understand Postgres you can add an index and resolve this with a single command. We could even give you an easy button in your email to add the index. All the foundations were in place, if you’ve run on Heroku you’ve gotten the notifications about database maintenance, that was powered entirely from the underlying notification system built with this goal in mind.

Postgres can still be better for developers

But we never made it there. Some of us shuffled to different teams, some of us moved on to new challenges, and many folks came on after to continue to run and power a great database service. Some of those original engineers Daniel Farina and Will Leinweber (along with PvH) understand the design and why as well as anyone. The goal from early on was that we could do better for developers.

Two years ago when people asked why I came to Crunchy Data I told them I had unfinished business. After the success at Citus tackling scalability problems where the average customer being 40TB in data, I attracted to the idea of returning to the vision we had back at the DoD of bringing a better Postgres experience to developers. Despite the rapid growth of successful DBaaS offerings, there was still something missing - that initial idea of DoD that we still wanted to create.

Postgres is an amazing database, can handle hundreds of thousands of transactions per second often without batting an eye. Has internal data that you can easily look at to assess and improve performance. Has a rich set of datatypes, indexing, and functionality. The extension ecosystem is vast. But as a developer you don’t have time to become an expert on Postgres.

What if we told you when a N+1 query snuck into your Rails app?
Connections don’t have to be a limitation on Postgres when you have pgBouncer right there.
Have excessive indexes from the early stage of building your app? What if we told you about them and with a button click you could drop them.

One of our @crunchydata customers described it better than I ever could. When working with some of our experts on some deep dives into their database they said “you should take all his Postgres expertise and just bottle it up and send it in email or slack reports to me.” I want that expertise as a product.

My product strategy isn’t to go and change the world of databases. Postgres is a great database with a community that is making it better daily. I want to help make open source Postgres better and give back to it along the way. My product strategy is to distribute deep Postgres expertise in a consumable form to every customer of ours in the coming years. Oh we’ll ship some cool things along the way too.

Guidance for Scaling - Reversible vs. Irreversible Decisions

Wed, 29 Dec 2021 13:30:56 -0800

Was having a conversation with a founder earlier today and the topic of hiring functional leaders came up. I offered one of my common pieces of advice which was don’t hold the reins too tightly once you hire them. It’s something I see happen over and over to first time founders. You hire a new VP of Product and then still continue to oversee so much of the product process yourself. It is understandable, it’s your baby, you’ve spent years building it to this point, they don’t love it the same way you do.

The likely outcome is your new VP of product won’t find success. They’ll feel they’re not able to execute on a vision of their own. They’ll feel micromanaged. Even the smallest decisions they’ll feel aren’t fully theirs and get second guessed.

Perhaps you could do it better, that’s not necessarily the question. What you’re focused on is growing and scaling, this is the reason you hired them. So how do you do this without the business careening off tracks? Well first, empower them to make decisions, but beyond that there are a few things you can do so you feel more comfortable entrusting them with their functional area.

Communication is key

First, clearly communicate your priorities and thought process. This doesn’t mean tell them what to do, but priorities… Bob Iger states it as “You have to convey your priorities clearly and repeatedly. In my experience, it’s what separates great managers from the rest”. I couldn’t immediately find the reference, but recall reading that he started each week with his execs communicating his top 5 priorities across the company. This gives you a line of sight into the leadership’s thinking. This could be as simple as:

I’m worried about our pipeline for next year
I’m worried about retention and how it affects long term growth
I’m worried that we have the tech stack to scale for next 3 years

Something similar to that last one came up in the conversation, which led us down a brief conversation of reversibles and irreversibles. In product management one of the most common mistakes I see of folks is not making a decision and heading in a single direction, rather they want all the info before they make a decision which can cause as much harm as good (but I’m getting off track).

Reversible vs. Irreversible Decisions

Communicating to new execs a framework around which decisions are critical to get right and which ones can be recovered from if wrong is huge. Not all decisions are critical to get right. If ship a typo in a blog post, you can easily fix it. If you send a wrong message to a single customer, it could create a bad experience for that customer but not impact all the rest. Knowing the scope and impact of a failure is important in a framework for decision making.

We discussed that architecture and engineering decisions it’s harder to reverse than say marketing. But I’m not sure that holds true, if you have a huge PR nightmare from bad marketing it can take years to recover and heavily set back the business just like a bad architecture decision. Even within engineering decisions you can use a framework of how costly is this to commit to and how easy is it to reverse.

One of the best engineers I’ve ever worked with, for going on 15 years now states it as not foreclosing future options. Leaving optionality open in engineering is one of those 10x things. It’s not elegant code or overengineering, it’s highly immeasurable (at least to me it is), but trying to grow this is huge to enable in your functional leadership as well as more senior individual contributors.

Which decisions can later be changed and the cost of that is a great framework for how involved you should be vs. not, and the more you can guide your team through that the smoother growing the team will be.

Top 5 Product and Management skills: SQL, Excel, Clear Communication, Story, Prioritization

Tue, 27 Apr 2021 13:30:56 -0800

A few months ago there was a tweet from @jasoncwarner about leadership skills/super powers:

SQL
Excel
Concise writing
Story telling
Prioritization.

I’ve spent quite a bit of time in product management roles, and in recent years more in leadership. I’ve found a lot of the skills in product to translate into good leadership skills as well, but maybe I’m bluring the lines there. Regardless, with his 5 skills I found myself nodding and have written about each of these some on my blog and then at times on twitter. He long since deleted the tweet, and while I wait for him to republish I thought I’d reprise a few of these with my own view point.

SQL

Yeah, I’m a “database” person. But not really, I’m a product person. But if I want to answer a question about what our customers are doing 9 times out of 10 the answer to that question is hiding inside a SQL database. If it’s not in a SQL database they’ve made a SQL like interface to access that data. If you want to feel like you have some magical super power that probably none of your peers posses pick up SQL. How many people working in React know SQL? Know many people that write Go know SQL? Same question if you know Ruby.

The insights into how many users created their freemium account 3 months ago, but then converted to paying within 30 days, vs converted to paying after 30 days for a cohort analysis of fast converters vs. slow converters I can probably write in SQL before you’ve parsed what I’m trying to get at and started to write in any other language. That type of insight is powerful.

Now if you think of how many people on a product team, or a management team know SQL-you’re in a unique position. It really is a super power

Excel

This one is more common with MBAs and business types, but nonetheless is still valuable. While I love SQL, a pivot table in SQL isn’t quite the same. There are absolutely people that can spin circles around me in Excel (looking at you @rstephensme). But Excel is way more broad reaching a programming language than literally everything else. It’s powerful and rich, and not all data is large and needs a database.

Proficiency to quickly slice and dice things is huge. As you level up in your career you need to be able to take in a lot of information, ask questions of it, fact check it, and then make a decision on it. Excel is one big tool to help on this.

Writing

My grammar is shit. I know it. But that’s mostly okay, I’ve explicitly focused on clear communication in my career. Concise and clear communication is huge. While the above are super key, if you can simply listen and ingest information that is not clear and then regurgitate it in a clear manner you can have a career in a lot of industries.

A few disparate tips here:

My favorite question to ask in any situation “What problem are we trying to solve here?” It’s greatly focusing and if you internalize it’ll steer you to better communication
Michael Dearing has a great talk on executive communication using SCQA given @heavybit
I run all important emails, blogs, etc. through hemingway app, the lower the grade level the better

Story telling

I’m actually not quite sure where to start on this one. At some level we all want to be entertained. Hollywood isn’t the industry it is because they forced us into it. But story telling is a harder one, I’ve never coached/mentored anyone on it, I’m not even sure I’m expert at it. I definitely know a good story and the value of one. Watch a presentation that is monotone and reads off the slides, and it’s not that they’re just monotone it’s that the story isn’t there. I’ve found I personally love to follow some of the story board supervisors and animators for Pixar and Disney Animation that talk about story narrative. I’m not sure it’s made me better at this, but it’s entertaining.

If you can be entertaining you can layer in the valuable pieces.

But be yourself in the process.

Prioritization

So many startups I’ve talked to have analysis paralysis. Many management folk in their first tenure have analysis paralysis. If there is literally one thing you can start doing around this, it is to make a decision, any decision. Measure and course correct later, but way too much time is wasted in deciding and that non-decision is a worse decision in and over itself.

Now when it comes to prioritizing and making the right decision. I’ve used the same tool for over 10 years, other teams I’ve trained on it. I’ve run the exercise for teams that wanted coaching on it. Is it perfect no. The important part is find a process and stick with it and perfect it. Per friend Rimas product principles “Be consistent - if you’re going to use a trick, use it a lot.” One of my tricks is gridding, which is an effort vs impact matrix. You can check out a part 1 and part 2 write-up.

The key here is when you step back with higher level granularity and put things side by side visually a lot of things sort themselves out. We conduct the exercise over a 1-2 hr period, often over an offsite. But have done a lot on zoom over the past year. Keep in mind an important part is make a decision on what you will and won’t do. The won’t part is as important as what you will.

Conclusion

Could there be 10 things on the list? Sure. Is there something more important than these? I’m not really sure. If you want to round out your skill set for the next few years in product or management-hope this helps. If you think it’s crap @jasoncwarner, but if you really like it I’m happy to take credit @craigkerstiens

Exploring a new Postgres database

Sat, 14 Nov 2020 11:30:56 -0800

At past jobs I’d estimate we had 100 different production apps that in some way were powering key production systems. Sure some were backend/internal apps while others key production apps such as the dashboard itself. At other companies we had a smaller handful of Heroku apps that powered our cloud service, about 5-10 in total. Even just working with those internal apps it’s a number of things to keep context on. But when it comes to interacting with something you don’t know getting a lay of the land quickly is key. In helping a customer optimize and tune, or even just understand what is going on in their app an understanding of the data model is key.

As I just started a few months back at Crunchy Data I found myself digging into a lot of new systems and quickly trying to ramp up and get a feel for them.

Over the past 10 years I’ve pretty well codified my steps to getting a feel for a new database. While I’m not a DBA and a small portion of my job is spent inside a database being able to quickly navigate one saves me hours each month and days out of the year. I’m sure my process isn’t perfect, but hopefully it helps other when navigating a new Postgres database for the first time.

First the tooling

For any new database my go to tool is psql. The built-in Postgres CLI is going to be the quickest thing for me to navigate around. If you use a CLI for anything else then this should be your preference here as well. I’m also going to have a psqlrc file setup that has some good defaults. My go to defaults in my psqlrc are:

-- Automatically format output based on result length and screen
\x auto


-- Prettier nulls
\pset null '#'

-- Save history based on database name
\set HISTFILE ~/.psql_history- :DBNAME

-- Turn on automatic query timing
\timing

Getting a feel for the tables

The first thing I’m going to do is just table a look at which objects exist within the database with \d. This will spit out a mix of tables, views, sequences all within your database. A cleaner version of this may be \dt which is only tables:


\d

 List of relations
 Schema | Name | Type | Owner
--------+-------------------------------+----------+----------------
 public | pg_stat_statements | view | postgres
 public | schema_migrations | table | postgres
 public | sessions | table | postgres
 public | sessions_id_seq | sequence | postgres
 public | tasks | table | postgres
 public | teams | table | postgres
 public | users | table | postgres
(7 rows)

\dt

 List of relations
 Schema | Name | Type | Owner
--------+-------------------------------+----------+----------------
 public | schema_migrations | table | postgres
 public | sessions | table | postgres
 public | tasks | table | postgres
 public | teams | table | postgres
 public | users | table | postgres
(5 rows)

We can also use the describe operation (\d) on specific relations as well such as tables to get a feel for how they look:


\d users

 Table "public.users"
 Column | Type | Collation | Nullable | Default
---------------------------+--------------------------+-----------+----------+--------------------
 id | uuid | | not null | uuid_generate_v4()
 email | text | | not null | ''::text
 encrypted_password | text | | not null | ''::text
 reset_password_token | text | | |
 reset_password_sent_at | timestamp with time zone | | |
 remember_created_at | timestamp with time zone | | |
 last_sign_in_at | timestamp with time zone | | |
 created_at | timestamp with time zone | | not null |
 updated_at | timestamp with time zone | | not null |
 name | text | | not null |
 team_id | uuid | | not null |
 deleted_at | timestamp with time zone | | |
Indexes:
 "users_pkey" PRIMARY KEY, btree (id)
 "index_users_on_email" UNIQUE, btree (email)
 "index_users_on_reset_password_token" UNIQUE, btree (reset_password_token)
Foreign-key constraints:
 "users_team_id_fkey" FOREIGN KEY (team_id) REFERENCES teams(id)

Human readable output

Of course you may want to go one step further and actually get a sense for the data. Here a basic SELECT tends to work with a limit 1. As you don’t quite know the shape of the data this is where having \x auto setup within your .psqlrc file is helpful to autoformat the output to your screen. You can also just manually run \x auto in your SQL session to get cleaner output.


SELECT *
FROM users
LIMIT 1;

-[ RECORD 1 ]-------------+--------------------------------------------------------------
id | 0a7a3cde-3613-4073-86a7-6a19b4e62bbe
email | craig.kerstiens@gmail.com
encrypted_password | $#IJ937Gmsdf00297sEmdfu12234
reset_password_token | #
reset_password_sent_at | #
remember_created_at | 2016-07-14 14:31:01.414795+00
last_sign_in_at | 2020-02-12 21:32:53.629246+00
created_at | 2016-02-18 03:03:26.403108+00
updated_at | 2020-02-14 23:16:16.080729+00
name | Craig
team_id | d46e864-1886-45e6-b538-8991562d2e99
deleted_at | #

Time: 91.592 ms

Most databases I work with now leverage JSONB. It’s a great tool for mixing semi-structured data with more structured data.

If you’re using JSON or JSONB then there is also a handy utility function to clean up that output - jsonb_pretty(yourcolumnhere). This will take care of making that huge JSON blob nice and readable.

Feel more at home when you encounter a new database

It doesn’t take hours of reading an ERD or schema files. In about 5-10 minutes of connecting to a new database I’m able to get a sense of how things are structured and able to actually start digging in. Don’t get me wrong, I’m by no means and expert in that time, but knowing some of these basic commands will really help the next time you encounter a database and are asked to help out or glean insights from it.

Spokesperson certification

Tue, 21 Apr 2020 12:55:56 -0800

One of my most fascinating work experiences was going through the spokesperson certification process at a large tech co. This isn’t some rubber stamp virtual training to not use profanity on stage type training. This is the training they would give to any executive before you were greenlit to talk to press. When I say press I mean Techcrunch, but also Bloomberg, or Jim Cramer, or any major big brand news outlet.

As a product manager over a specific product line I knew my product well. Put me in front of an unhappy customer and I could lay out our roadmap, listen to their questions, take product input, and get them to a happy place. But this wasn’t about my product (only). A person with the spokesperson stamp could be asked any question about an entirely other area fo the company. You had to know every recent product launch, all the key metrics, know where traps may lie, and you had to land the core company messages in addition to the ones you cared about. To study you received about 100 pages of a powerpoint presentation that had key releases from each product, key numbers, customer stories.

The certification itself was an interview. They flew in a former news reporter. You walked into a conference, the lights were off, except for a bright light focused on the seat you’d sit in with a camera rolling. It felt more like an intense interogation room than a big co tech conference room. To ensure light didn’t get in and no one stopped walking by the room they put up black paper to completely black out the room.

Oh, and the worst part of the process… I was told my our marketing person I could wear my hat… as I always do during the interview. (I was probably the only person not in a full suit they saw the entire day. And probably the only person that walked in with a ballcap on ever.) Well she said yes of course, turns out they couldn’t see my face under the lights, it was just a shadow so I had to take it off. From the outset I’d been tricked, but I digress.

The questions would start with a basic, tell me about yourself and your background. Can you tell whats new and exciting about product x. Then over time it would delve into the other product areas. I told some of the canned stories and some personal ones. I learned on the personal ones, it’s up to me but they coach their executives to not said spouses or childrens names, it can only get them in trouble. For each product line you were supposed to hit major 2 news and announcements, 2 customers, and 2 key stats (i.e. we crossed 1 billion mentions of my name).

But it wasn’t just softballs. There were traps. You were asked about an executive that recently left, and if the product line was okay or was it a sign of bad things to come during earnings. It’s fine though I reassured them they’ve made some great contributions to the company, and wanted to spend some time with their family and giving back to their local community. Of course internal speculation was they were cashed out and interested in running for public office.

As the camera turned off a few of the stories I’d told about customers and products the PR team wanted to dig deeper on. Remarking “That’s amazing, we power that I had no idea.” and “That’s awesome they’re able to do that at scale thanks to us.”

In the end I got a stamp of approval, I was cleared to talk to folks. The funny part was the comments from the PR team afterwards. They liked me, I seemed relatable, I nailed all the numbers. I wasn’t like any of their other spokespeople and they well weren’t quite sure what to do with that. I was used for some very particular media folks in the future that seemed to not want a cookie cutter. I’m good with this, hopefully it helped the company.

In the end it was fascinating experience. The ability to bridge, condense a lot of information (relatable story, customer brand validation, stats, and something quotable) into a single answer all from a question that was meant to be a trap for a juicy story has been easily one of the top work experiences I’ve encountered.

Lessons from college: Efficient meetings

Tue, 17 Mar 2020 12:55:56 -0800

I think back to my time in college, and I learned some valuable things. I also learned some incredibly worthless things (i.e. don’t flip a car upside down and then backover… it’ll break the axle so you can’t roll it). Even in classes… the basic approach to a supply/demand curve to maximize profit is cute when done in a classroom vs. the complexities of how things actually work… I mean I get the idea behind it, but what you learn is so far being able to be translated into being usable. But what surprises me looking back was a couple of skills around running meetings that I find so rare in the workplace that have immense value.

I’ve always been fascinated at the intersection of business and technology. I’d been coding for a long time before college, and while interesting it was also a means to an end. When you combine technology with business you can solve things in entirely new and valuable ways. My major was management information systems, and all folks in my program came out with a computer science minor in addition to their business degree–something pretty rare for more MIS majors in other programs and well generally for anyone coming out of a business school. Perhaps I’ll get into the value of CS training even if you aren’t looking for a CS job some other time.

Within the program we would have a senior project that was actually a real world project for one of the large companies that sponsored part of the program. We’d have monthly reviews with the company stake holder. We’d also have weekly meetings, these were especially well run. There were really 3 items that made them especially efficient.

1. Agendas circulated out 24 hours ahead of time

Before each meeting there was a very explicit agenda. This was circulated out 24 hrs in advance, at almost exactly the 24 hr mark the professors would inquire into the delay in the agenda. This early circulation allowed for:

Time to review and prepare
Ability to make modifications

A sample agenda may look something like:

(5 minutes) - review last weeks action items
(10 minutes) - review blockers
- blocker 1 - foo
- blocker 2 - bar
(20 minutes) - feature design walk through of x
(15 minutes) - toubleshooting of
(10 minutes) - review action items and next steps

2. Explicit roles for the meeting

There were three roles explicitly set for each meeting:

Manager - This was essentially person responsible for setting up the agenda, ensuring the agenda was followed, and making sure everyone was involved and an active participant. I find that last piece is very key still today. Making sure folks that are remote have a chance to chime in, or ensuring very junior people are heard. Now days I often keep a small tally of how often various people within a meeting speak and make sure to give those that don’t a chance to.
Timekeeper - 75% of the meetings I’m in run long, and in the end there is a big scramble to figure out what the result of the meeting was and whats next. A timekeeper making sure that you spend the allocated amount of time is key. If you don’t do this that 10 mintues for reviewing action items and next steps gets squeezed and you lost a lot of the value of the meeting.
Scribe - A person whose sole job is to take notes ensured there were good quality notes. These would be kept within a sharepoint that was circulated around.

3. After the meeting

While the scribe was the one to take the notes, the meeting wasn’t done when the meeting was over. The notes were then circulated around. Everyone on the team would review and make comments/notes on things they felt were different or missed details. At the end in the review action items there would be very clear owners assigned and next steps laid out.

Within 24 hrs after the meeting while it was fresh everyone was required to review and acknowledge. This ensured there was a closed bookend and then this would be a clear transition for each of the roles to move on to the next meeting.

I’m sorry professors, you were right

At the time, it was annoying. I could take my own notes. I remembered the things I needed to do. If we went over on time for one item it was because it was important. But now… putting these things in place, any time I replicate this I get more time in my day and the team gets more done. Science/Math/History… sure, but running efficient meetings I never would have expected how basic but also challenging and how valuable.

An interview on what makes Postgres unique (extensions)

Wed, 13 Nov 2019 12:55:56 -0800

I’ve been at dinners before with developers that admitted developers, themselves included, can be a bit opinionated. In one case one said for example, “I love Postgres, but I have no idea why.” They were sitting at the wrong table to use Postgres as an example… But it is quite often that I am asked Why Postgres.

In fact a little over a year ago good friend Dimitri Fontaine asked if he could interview me for a book he’s working on for Postgres. I’ve long said their is a shortage of good books about Postgres and he’s done a great job with his in providing a guide targetted at developers, not just DBAs, that want to become better with their database. What follows is the excerpt of the interview from the book. And if you’re interested in picking up a copy he was friendly enough to share a discount code you can find below.

Intro

Craig heads up the Cloud team @citusdata now running product for Azure Postgres since being acquired by Microsoft. Citus extends Postgres to be a horizontally scalable distributed database. If you have a database, especially Postgres, that needs to scale beyond a single node (typically at 100GB and up) Craig is always happy to chat and see if Citus can help.

Previously Craig has spent a number of years @heroku, a platform-as-a-service, which takes much of the overhead out of IT and lets developers focus on building features and adding value. The bulk of Craig’s time at Heroku was spent running product and marketing for Heroku Data.

In your opinion, how important are extensions for the PostgreSQL open source project and ecosystem?

To me the extension APIs and growing ecosystem of extensions are the biggest advancement to Postgres in probably the last 10 years. Extensions have allowed Postgres to extend beyond a traditional relational database to much more of a data platform. Whether it’s the initial NoSQL datatypes (if we exclude XML that is) in hstore, to the rich feature set in geospatial with GIS, or approximation algorithms such as HyperLogLog or TopN you have extensions that now by themselves take Postgres into a new frontier.

Extensions allow the core to move at a slower pace, which makes sense. Each new feature in core means it has to be thoroughly tested and safe. That’s not to say that extensions don’t, but extensions that can exist outside core, then become part of the contrib provide a great on ramp for things to move much faster.

What are your favorite PostgreSQL extensions, and why?

My favorite three extensions are:

pg_stat_statements

Citus

HyperLogLog

pg_stat_statements is easily the most powerful extension for an application developer without having to understand deep database internals to get insights to optimize their database. For many application developers the database is a black box, but pg_stat_statements is a great foundation for AI for your database that I only expect to be improved upon in time.

Citus: I’m of course biased because I work there, but I followed Citus and pg_shard for 3 years prior to joining. Citus turns Postgres into a horizontally scalable database. Under the covers it’s sharded, but application developers don’t have to think or know about that complexity. With Citus Postgres is equipped to tackle larger workloads than ever before as previously Postgres was constrained to a single box or overly complicated architectures.

HyperLogLog: I have a confession to make. In part I just love saying it, but it also makes you seem uber-intelligent when you read about the algorithm itself. “K minimum value, bit observable patterns, stochastic averaging, harmonic averaging.” I mean who doesn’t want to use something with all those things in it? In simpler terms, it’s close enough approximate uniques that are compose-able with a really small footprint on storage. If you’re building something like a web analytics tool HyperLogLog is an obvious go to.

How do you typically find any extension you might need? Well, how do you know you might need a PostgreSQL extension in the first place?

pgxn.org and github are my two go-tos. Though Google also tends to work pretty well. And of course I stay up to date on new ones via PostgresWeekly.com.

Though in reality I often don’t always realize I need one. I search for the problem I’m trying to solve and discover it. I would likely never search for HyperLogLog, but a search for Postgres approximate count or approximate distincts would yield it pretty quickly.

Is there any downside you could think of when your application code base now relies on some PostgreSQL extension to run? I could think of extension’s availability in cloud and SaaS offerings, for instance.

It really depends. There are extensions that are much more bleeding edge, and ones that are more mature. Many of the major cloud providers support a range of extensions, but they won’t support any extension. If they do support it there isn’t a big downside to leveraging it. If they don’t you need to weigh the cost of running and managing Postgres yourself vs. how much value that particular extension would provide. As with all things managed vs. not, there is a trade-off there and you need to decide which one is right for you.

Though if something is supported and easy to leverage wherever you run, by all means, go for it.

If you’re looking for a deeper resource on Postgres I recommend the book The Art of PostgreSQL. It is by a personal friend that has aimed to create the definitive guide to Postgres, from a developer perspective. If you use code CRAIG15 you’ll receive 15% off as well.

Interesting Upcoming pgDays

Tue, 29 Oct 2019 12:55:56 -0800

I’ve been to a lot of conferences over the years. PgConf EU, PostgresOpen, too many pgDays to count, and even more none Postgres conferences (OSCON, Strangeloop, Railsconf, PyCon, LessConf, and many more). I’ve always found Postgres conferences one of the best places to get training and learn about what’s new with Postgres (in addition to Dimitri’s recent book, more on that below). They’re my regular stop to catch up on all the new features of a release before it comes out, and often there is a talk highlighting what is new with a simple easy to understand summary once released.

I just got back from PGConf EU a little over a week ago and it was a great time. I’m sure we’ll see some rundowns of it start appearing on Postgres planet. But, as far as I’m concerned PGConf EU is in the past (unless your counting next year which is in Berlin-in which case I’ll see you there). For me it’s time to look to the future and there are a number of upcoming pgDays I’m looking forward to.

The first two I want to highlight are separate events, but you’ll notice they’re scheduled nicely for you to easily attend both. With a day in between for travel you’ll find that many speakers and attendees depart one and head straight to the other. It makes for an easy opportunity to visit two cities, see two different communities, and yet not have to spend too much time traveling. The first is Nordic pgDay in Helsinki. It’s coming up on March 24. The second is pgDay Paris on March 26. Both of these are great single track conferences. If you’re in Europe of fancy a trip to Europe I recommend giving them a look, and even better the CFPs are open so consider submitting.

Another pgDay I have to mention is right in my backyard. pgDay SF I’m particularly excited for a few reasons:

San Francisco is very much a central tech hub, which means a great chance of learning from folks at many many interesting tech companies in attendance
Just like Nordic and Paris this is a single track conference, which I’m a personal fan of because you can have continuity between talks and shared conversation with other attendees
The venue! If you’re not from the bay area you may not be aware, but Swedish American Hall is well known music venue within SF. It’s had many famous artists over the years and now pgDay SF joins the ranks.

This isn’t an exhaustive list of course, just a few on my personal list that I hope to make it to. If you’re there and see me make sure to say hi!

Guidance for conducting offsites

Thu, 10 Oct 2019 12:55:56 -0800

Offsites an invaluable tool in getting a team aligned. I’ve been a part of organizations where offsites never happened, and then when they happened at a regular interval. Just because offsites happened it didn’t mean they had the same significant impact to alignment and ability to execute moving forward. What follows is a few key principles around conducting an impactful offsite.

Get out of the office

An offsite isn’t reserving a conference room for a full day and just sitting there meeting on special topics. A change of scenery is important, and nice scenery will impact your abiltiy to collaborate. One long time friend and colleague communicated to me about 10 years into his working career, that the thing he most values in a workplace is natural light. He remarked younger him wanted free food and drinks, now he finds him much happier and more productive with natural light above all other amenities. A similar environment for an offsite is very helpful.

The best option isn’t a hotel with a rented conference room, but rather an AirBnB with each person their own room, a large kitchen, and large living room. More on the why for each of those later.

From working with some teams that are often local and a portion of them remote, hosting the offsite within a 2 hr drive of your primary office can be good. You don’t waste the full day traveling, for remote team members they can also budget 1-2 days in the office to see other team members in addition to the time spent at the offsite.

Collaboration is key

Get rid of powerpoint presentations and someone presenting on a screen. That isn’t collaboration, that is a presentation. If you have a team of 8 and each person presents for an hour you haven’t collaborated for 8 hrs, you’ve been presented to for 7 and presented for 1. I personally don’t like team building exercises, but I’ll admit they can be useful here.

A presentation by the leadership at the offsite for a few minutes to frame things can be helpful. It’s key to not have a projector and folks staring at a screen. Instead having a few prepared notes and ability interactively discuss is great. Collaborating on a large notepad, or using sticky notes for ideas and brainstorming is great. One of my favorite things to do at an offsite is a gridding exercise.

2-3 days is ideal

You don’t need a full week offsite, but a 1 day offsite doesn’t really provide ideas to evolve and change over time. The ideal to me is 3 days where you can make some statements about ideas on day 1, and then drill deeper into those on day 2 or 3. Having the ability to do research on data and evaluate any plans you make is helpful. You also then can have good collaborative time, but also set aside time for contentious topics. If you cover a contentious topic and then jump right into collaboration without time to change your mindset you’re going to get a very different outcome. Breaking things up across days makes this way easier.

2-3 days also allows time for non work activities. Coming together as a team will let you work better in the future. Boardgames are a common favorite, video games are an occasional activity. Sometimes sitting by a fireplace and enjoying a nice glass of wine and just catching up.

Those moments can live on for a very long time. Many colleagues from years ago will recall how when playing “We didn’t play test this at all”, we brought back online an archived server intentionally to cause a page so an engineer would pick up his phone, only for one of us to play a card causing him to lose the game due to holding his phone. Those stories create a foundation and shared experience to build on in the future for how you work together

This takes me onto the next point…

Food

Too often offsites are a day of meetings in a conference room, followed by dinner out at a nice restaurant. The exact opposite works better. We’ve already talked about location. But food should be collaborative as well. Over 2-3 days you may do one dinner out, but even that isn’t the norm. Instead make a meal list ahead of time, and people spend time in the kitchen together. You’ll typically find you have 1-2 more cooks in the team than others, but then you have a lot of helpers.

It actually reminds me a bit of my wedding reception back home. I smoked about 80 lbs of meat for the reception. As we were taking the meat off, I walked back in to 4 generations of in-laws, the older ones teaching the younger ones how to pull pork.

Cooking together is another form of collaborating. You can easily learn about cultures and backgrounds. I didn’t realize melon with proscuitto was always supposed to be server with port if you’re in France, but now I can eat it no other way. Splitting up the duties so that 2-3 help cook a meal means you don’t spend all your time cooking, but you get people working together.

Don’t steal a weekend

I mentioned 2-3 days is ideal. Please don’t use one of those days as a weekend. As the leader or coordinator of such you may not have weekend obligations and happy to give up a day of it. But many on your team may have other commitements and such, an offsite is valuable, but asking people to give up their weekend to essentially do more work isn’t a great look.

Offsites for everyone

Offsites aren’t just for management. The folks in boots on the ground doing the work has as much need and value as the leadership team sitting there determining the next 5 year strategy. Allowing engineers that are building things, supporting customers, answering pages to brainstorm and collaborate on how to more effectively design the systems and move more quickly will pay huge dividends. You’ll find experiments, and then new tools and systems that allow you to ship things faster with higher quality. Collaborating with the product team is key as well, an engineering offsite in isolation leaves out the voice of the customer. A product offsite without engineering leaves you daydreaming of solutions without being grounded in the reality of what is possible.

The Engineering Manager/Product Manager Marriage

Sun, 30 Jun 2019 12:55:56 -0800

I’ve worked as a PM at a number of size companies for a few years now. At a startup and then as a part of a larger company once startups were acquired. I’ve been the first PM for a team as well as first for a company. I’ve written at times about product management, and today I’d like to drill into one aspect that doesn’t seem to get talked about enough and that is the pairing of product manager and engineering manager.

Mom vs. Dad

As parents my partner and I have learned very quickly that we need to have a consistent voice and unified view of things. I care that our son watches less power rangers otherwise he’s going to use his megazord powers on our TV and we’ll be watching a lot more of nothing. My partner cares that when we’re visiting family in the south they drink enough water so they don’t get dehydrated.

Meanwhile my kids are experts at leveraging us to get what they want. My daughter came to me last night asking if she could play on her iPad some. Not knowing if she had or if she’d already given an answer my safest question was have you asked your mom? In the absense of knowing my default isn’t a yes or no, it’s a “let me learn more.” and then potentially discuss it.

EM vs. PM

As a PM I want us to build a rich and powerful product, but there is a strong balance to doing too little vs. too much. It isn’t always a question of doing more, we need to make sure the product is well built. In order to do that we need to say no at times. Saying no, as well as yes, needs to come from a unified front, both engineering and product. If one side is agreeing without being aligned with the other half you’re going to wind up with a confused and frustrated team.

There are a number of ways engineering managers and product managers can stay aligned. The first starts with being explicit with each other, so if you’re struggling with one side communicating things the other don’t agree with… sit down and have a conversation about it. From an ongoing perspective you can get to a better aligned position to not have to discuss every question before you give an answer. My partner knows to make sure we don’t watch more than 2 episodes of power rangers and when we’re too amped up on it we shorten that.

Are you having regularly 1:1s with your counterpart?
Do you review emails out to the team ahead of time to capture feedback?
Do you have conversations where you’re not on the same page in an offline fashion as opposed to in team meetings?
Are you treating your EM/PM just like your partner at home? (I guess that does assume you treat your partner well, but that’s a whole other blog post)

Come over for dinner

Wed, 01 May 2019 12:55:56 -0800

When I first moved to the Bay area I was fresh out of grad school. I was frequently heading out to dinner or to happy hour after work with colleagues. I was young and single, so why not of course. As time passed, marriage, kids, etc. the ability to go out for a quick drink or dinner was competing with various priorities. Dinner and drinks with co-workers was always a great time. It wasn’t just about hanging out, it built rapport and trust which I found made me a more effective teammate and product manager. It was about 8 years ago that I started to implement a variation of heading out for dinner and drinks.

I started inviting people over for dinner.

I still do this regularly. Roughly once a week we end up hosting someone for dinner. Sometimes it is a single person, sometimes it is a group of people. Sometimes it is co-workers, sometimes former colleagues, often friends that don’t work in tech. Growing up in the south it was common to have people over, I’d said we did that just as much as going out to dinner with folks. You’d get an invite to go to someone elses place and you’d show up with a bottle of wine or flowers in hand. Initially when I asked people in the Bay area over for dinner I’d get weird looks. Over? Like to your house?

The reaction from folks at the end of the night was very often… that was really fun. Thanks for the invite, I can’t remember the last time I just sat down at someones place, had a good meal, and conversation.

Once I found early success with this I started implementing it pretty methodically. When remote workers were in town I’d make sure to place them at thet top of the list to come if the scheduling worked. Same when friends visit from out of town. I’d also try to regularly rotate through my teams and those that report to me. At one point when I had 22 engineers that I was leading product for I had to do a bit of juggling and stagger things a bit, groups of 4 folks or so at a time and each would be over about once every 6 months. I made life a bit easier on myself by grouping vegans together, and vegetarians, and meat eaters. When you’re cooking in the Bay area you probably don’t have a 500 sq. foot kitchen, so cooking 2-3 different meals adds complexity. Thus simplifying.

All of this was for selfish reasons. I could excuse myself for 15 minutes to give the kids a bath, to read them a story, to tuck them in. At the same time I was able to continue building a rapport with co-workers that allowed us both to work together better.

Talking on the phone: better communication

Wed, 24 Apr 2019 12:55:56 -0800

I interact with a lot of people in a given week, a few in person and far more on video and conference calls. I don’t claim to be a perfect person to talk to on the phone, but over the past several years I’ve noticed how painful some conference call experiences can be. As more and more work is conducted virtually and not face to face an ability to do communicate well on conference/voice calls is tied to what success you can deliver. It isn’t about having a fancy phone or high bandwidth video call–though that can at times be useful

Let’s start with local/remote teams

This one is a huge problem as companies go from 100% local to partially remote. When you start a meeting take note of how many folks are local vs. remote. If you have the majority of attendees on a video or conference call you’re likely fine. If however the majority is local you need to make a concerted effort to pause and ask for input from the phone. This is doubley true on large conference calls… in those cases it can be best to explicitly call out remote folks to chime in.

Talk to the phone

When you’re in person folks can tend to wander, look away, work on a whiteboard. All of these things can be very collaborative when in person but are absolutely terrible on the phone. When you’re not speaking into where the microphone is you will immediately come across choppy. The subtle take away is you’re not a professional. That doesn’t mean you’re doing it intentionally, but when interacting via voice you have to be extra intentional to come across how you intend.

Strong, short words

This applies when in person, but is especially true on the phone when you can’t read someone else. If you don’t have verbal cues to feed off from the other person then you need to listen to audible ones. If you’re talking for 5 minutes non-stop then you’ve learned nothing about how the other person is reacting. At times you may need to talk that long. I like to talk, but after 5 minutes talking non-stop on the phone I have to make sure I stop and say I’ve been talking too long. One it is only fair, and two it is a converation and you need to ensure that it’s flowing both ways.

Gone with the wind

Your environment and surroundings are absolutely terrible to a productive conversation. If you’re walking and there is any sense of wind, if the room is echo-ey, fix it. I’ve seen this both with people taking calls in transit, but also with conference rooms that weren’t well designed. For in transit invest in a great headset if you’ll be taking calls. Test several, read some reviews. If it is a conference actively ask others on calls with you. To just assume it’s fine will lead to people silently dismissing you’re not serious because calls with others are smoother.

Overcommunicate

Speed up, slow down, raise your voice, lower your voice. On the phone you only have one form of expression: your voice. Use it. For some this takes practice. Whenever I’m talking to press or analysts I make sure I’m standing. I make sure to not have coffee 3-4 hrs before because otherwise I’m overly animated. Make sure to take pauses. An explicit pause can give others a chance to chime in and ask questions. When you hear someone on the phone asking a question and you already started talking because of latency finish, then apologize, and ask for them to go ahead.

Be intentional

No matter who you are or what you do you’re on the phone plenty. Whether a 1:1 call or a conference call. Most of use don’t like these calls and feel drained or frustrated by them. I don’t claim that most of my calls feel like a friday night our at a baseball game, but by being explicit in how we communicate I do feel that you can make calls not feel like busy work. Instead they can be a productive way to communicate if you make sure you’re communicating clearly and explicitly, but also listening and engaging equally with the other side of the line.

Using email as an effective tool

Tue, 16 Apr 2019 12:55:56 -0800

I send way too many emails in a day. My inbox is very intermingled with my to do list and often represents some form of it. More relevant though is that email is a primary means of how I accomplish work. Being a PM I work cross functionally with other teams (from marketing, to engineering, to sales, to BD, to other product teams) and of course customers. Having to work so cross functionality I’ve found a lot of hacks I use to be able to better accomplish your goals with email, here is a collection of some of those.

Let me be clear, this is not another post about inbox 0, how I swapped to slack. This instead is how I use email to more effectively communicate and get people to engage. In other words it is about making emails more useful, not just getting through them faster. And onto those tips.

Don’t leave a document in a document

Often times I’ve found folks will collaborate in a document during a meeting or take notes there. After the meeting folks will email the document around, but few seldom actually open. Reasons may be they’re not logged in on their phone or it may be they just don’t care that much, I’m not really sure. What I do know is that by taking the notes and action items from the doc and including them in the email you will get more people paying attention to them. If you really want to get good at this when there is significant revisions of a work in progress document, re-circulate the updated version via email. Again not just a link to it, but the document itself.

Mail merge isn’t just for marketing

Years ago I sent a company wide request for feedback (to about 120 people). I got less than 3 responses. I ran an experiment the next time I needed the same thing using a mail merge so my same email seemed personal and was from me to them instead of some large alias. I got a response rate of over 45%. Use this wisely… not every email you send needs a response. A broad update can absolutely be to team@, but when you need to get actual feedback and people don’t seem to chime in this approach works great.

Conversation begets conversation

Often times when there is an email to team@ folks just simply click archive. A great tip for managers and leadership is to chime in on these emails right away. It can help promote some positive discussion and cause folks to actually take notice of updates that might otherwise end up archived. Be careful of the inverse of this though, replying to all mass emails can create a culture where everyone thinks it’s their job to +1 and thumbs up things. There is a balance to this one between team productivity and team morale.

Text is boring

Email is not like reading a novel (except when my colleague Daniel Farina writes them), if I wanted to read a novel I’d actually go read a novel. For email I want the clear concise points broken out. You should think about how easy it is to read on a phone. If a single paragraph consumes my entire screen you’ve lost your readers attention and that content goes else where. A number of things can help for readability:

Short paragraphs and usage of line breaks
Bullets are your friend (aren’t you enjoying this list already?)
Numbers are also great as they call attention to themselves

Bold/italics can also be useful, but assume many people use text only clients so you need to be careful. Also really don’t over-use these

Add extra tooling

Gmail and Outlook are both continuing to improve over time. With the demise of Inbox Gmail itself is seeing some more of those features. Reminders about old un-responded to emails are great. But you can do much more by adding in a third party tool. Boomerang and yesware are two popular ones. These can help with things like returning an email to your inbox in a couple weeks so it can be out of sight for now but come back later. You can also schedule emails to be sent later at a time that may be more ideal for someone to read it, instead of sending it at midnight what about 9am so it appears more top of their inbox?

Do you know about muting?

Muting is the single greatest feature within gmail. Email is a great place for an archive and history, it is a great place to clearly communicate things, it can be a horrible place when it comes to generating a bunch of noise.

Personally I subscribe to lots of lists. I have filters for many of those, but many I actually want to see. That doesn’t mean I care about every response to every thread on those lists. When engineering is discussing whether to use CircleCI vs. Travis I see that the discussion is happening but don’t care too much on the outcome. Mute will silence that entire thread and move it to archived for me. It will only re-appear if I am explicitly added to the To:

The fact that it persists in my archive is key for me. If I want to go back later and search to see the outcome I can, but I don’t have to create a new rule, or archive each reply to the thread.

These are just a few

These are only a few tips, but ones I’ve found extremely useful on two sides. The first is being more control of my inbox so it doesn’t control me. A workflow that works for me which includes mute, filtering lists, scheduling emails for later helps with that. The second is to me email isn’t something I have to do, it is a tool to work more effectively. Being intentional on how I construct messages, when I send them, and what is clearly communicated is key for my job.

I’m curious, what are some of your favorite email tips @craigkerstiens.

OKRs aren't going to fix your communication issues

Sat, 30 Mar 2019 12:55:56 -0800

Talking with a startup a few days ago they asked for my opinions on OKRs. I have slightly mixed opinions on them overall and started to disclose some of those. Though in sharing some of this I had a few immediate realizations that might be broadly applicable. The crux of his question was, at what stage should we put them in place. I’ve seen a few companies try to put in some form of OKR, and most were met with pretty mixed results. The reason is that OKRs need to change something about your behavior otherwise why put them in place… either change something about the goals you would otherwise have or the methods at which you went about achieving them.

Stepping back a bit, my first question and a very focusing question on almost any situation to ask is “What problem are we trying to solve?” In our conversation he actually paused a bit. As he paused a bit longer it was clear that question had not been fully asked or answered.

The first and most common case I see with startups trying to put in place OKRs, v2moms, management by objectives is that the team is not aligned and focused on the same goals. But my follow-on question is consistently, have you communicated what you decided you goals were.

Startups tend to go through some distinct growing phases. The early stages all the founders are in a room together building out the product. When you get the first few engineers you expand out a little, but still in a single co-working conference room easily. Eventually you need a real office. At the real office stage you start to have an all hands where, this is probably gathered around a large lunch table at first. At all hands no one takes meeting minutes and sends out a recap, instead people take some notes and you assume everyone was present.

But, at about 20 people you have at least one person that misses the weekly team meeting and misses something key. In a 1:1 you catch it that it was talked about as a priority… but they weren’t there. This very subtle change I’ve seen linger all the way up to a 70/80 person org. I’ve observed management meetings where someone missed and a key member was entirely mis-aligned on what the goals were for months following.

That was a long detour, but the point is that explicit formal communication is a big change for early stage companies. Distributed teams tend to do this better than in person teams, but it is also not guaranteed.

OKRs present a heavy-weight answer to the problem. OKRs tend to require hours and maybe even days to determine what are the right goals and metrics. Even if they are quick, does the process of OKRs change how you structure your team and work significantly for the next few weeks/months. If not, could you much more easily get away with… wait for it… emailing out what the company says priorities are. Email out the meeting notes from your all hands meeting. Email out (gasp) a recap of what you discussed and are thinking about as a management team. Sure every manager could go and have a 1:1 and recap the points for 30 minutes with each of their employees. Or you could use this thing that we’ve had for a little while… email.

Tips for your first tech conference

Mon, 18 Mar 2019 12:55:56 -0800

I make it to a lot of conferences these days. I often see colleagues, former colleagues, and friends at these conferences. Sometimes it is friends I haven’t seen in a few years, sometimes I just saw the same person in a different country the week before. Conferences now are much easier for me, in fact it is a bit hard to recall what the experience was like when I first started attending, but I’m at least going to try to give some input so others can have a smoother first experience.

Most people know no one

There is a very strong chance you’re attending your first conference by yourself or just with a single colleague you know. Being suddenly surrounded by thousands of people you don’t know can be intimidating. The reality though is most folks there are just like you and know very few other people. Most people don’t go to multiple conferences in a year, they go to 1. There are often a lot of opportunities to get to know folks. Some of the best times are during breaks between the talk track, and whenever there is food/beverages. If there is breakfast served at the conference as much as you’d like to sleep in wake up and go have breakfast and chat with someone. At lunch if there is a half full table, go ask if a seat is taken and join the table vs. sitting at a completely open one. During the breaks don’t rush to get to the talk 10 minutes before it begins, stay and mingle with folks.

If you’re really feeling shy wonder the booth floor. Those folks are paid to be there to talk to you. Let them tell you about their product or service, it’ll at least be a start to some flowing conversation.

Speakers can be your friends too

If you’re new often the speakers can seem extra intimidating to get to know. Don’t let that be the case. If one is talking about a topic you’re excited about hang around after the talk and discuss it with them. Speakers aren’t necessarily any more technically advanced than you are they just happened to get a talk accepted.

Study the agenda

The default is to look at the list of talks being presented and pick out which ones you want to attend. But pay closer attention at so many conferences there are a bunch of other great activities to participate in. PyCon is a model example of this. In addition to all the talks there is:

PyLadies Auction - Which is a great night of dinner, drinks, and an auction of fun Python focused or other items which benefits PyLadies.
[5k run] - You wake up early in the morning and run (it should be noted that 5:30 am does not sound fun to me, but to each his own)
Evening dinners - If you’re worried about having dinner plans then this is a great option at a fun venue already coordinated.

If you’re not at something as structured as PyCon then there are a few places you can find what is happening. These days conferences often have a slack channel. Join it, find out when people get into town. Find out if people like to hike or visit breweries, join in with others. There is also twitter. Twitter things don’t get planned far in advance, but often you can find last minute folks that are coordinating things.

Stay at the conference hotel

If there is an official conference hotel or hotels, stay there. This opinion may be a little more controversial, but stay with me. First you’ll find folks hanging around the hotel bar after or just sitting around mingling. It’s a just one more way to not feel isolated. Second, it often helps the conference. Conferences can take on big risks with minimum rooms filled at a hotel. They do this to get good rates, but it also presents risk. Staying at the hotel both keeps you around others that are there for the conference, but also helps the conference itself.

You don’t have to attend all the talks

Talks are super helpful. You can learn a lot in a compressed period of time and then often get where to go next to learn further. Great ones teach you something but are also entertaining presentations as well. But, you don’t have to attend all the talks. Many conferences record the talks and make them available later. Personally I make a note of ones I wanted to see, as well as ones that others said were especially great. Then I go to youtube and watch them at 1.5 or 2x speed to get through more of them in less time.

Conclusion

Going to a conference may absolutely feel intimidating if it is your first. There will be a little of that first day of school jitters. What do you wear, who will you talk to, what if no one sits with you at lunch. Just know that for most people there it really is like the first day of school for them too. Even though it may be intimidating hopefully the above helps you jump in with a little less apprehension.

Give me back my monolith

Wed, 13 Mar 2019 12:55:56 -0800

It feels like we’re starting to pass the peak of the hype cycle of microservices. It’s no longer multiple times a week we now see a blog post of “How I migrated my monolith to 150 services”. Now I often hear a bit more of the counter: “I don’t hate my monolith, I just care that things stay performant”. We’ve actually seen some migrations from micro-services back to a monolith. When you go from one large application to multiple smaller services there are a number of new things you have to tackle, here is a rundown of all the things that were simple that you now get to re-visit:

Setup went from intro chem to quantum mechanics

Setting up a basic database and my application with a background process was a pretty defined process. I’d have the readme on Github, and often in an hour or maybe a few I’d be up and running when I started on a new project. Onboarding a new engineering, at least for an initial environment would be done in the first day. As we ventured into micro-services onboarding time skyrocketed. Yes, we have docker and orchestration such as K8s these days to help, but the time from start to up and running a K8s cluster just to onboard a new engineer is orders of magnitude larger than we saw a few years ago. For many junior engineers this is a burden that really is unnecessary complexity.

So long for understanding our systems

Lets stay on the junior engineer perspective for just a moment. Back when we had monolithic apps if you had an error you had a clear stacktrace to see where it originated from and could jump right in and debug. Now we have a service that talks to another service, that queues something on a message bus, that another service processes, and then we have an error. We have to piece together all of these pieces to eventually learn that service a was on version 11 and service q was expecting vesion 12 already. This in contrast to my standard consolidated log, and lets not forget my interactive terminal/debugger for when I wanted to go step by step through the process. Debugging and understanding is now inherintly more complicated.

If we can’t debug them, maybe we can test them

Continuous integration and continuous development is now starting to become common place. Most new apps I see now days automatically build and run their tests with a new PR and require tests to pass and review before check-in. These are great processes to have in place and have been a big shift for a lot of companies. But now to really test my service I have to bring up a complete working version of my application. Remember back to onboarding that new engineer with their 150 service K8s cluster? Well now we get to teach our CI system how to bring up all those systems to actually test that things are working. That is probably a bit too much effort so we’re just going to test each piece in isolation, I’m sure our specs were good enough that APIs are clean and service failure is isolated and won’t impact others.

All the trade-offs are for a good reason. Right?

There are a lot of reasons to migrate to micro-services. I’ve heard cases for more agility, for scaling your teams, for performance, to give you a more resilient service. The reality we’ve invested decades into development practices and tooling around monoliths that are still maturing. In my day to day I work with a lot of folks from all different stacks. Usually we’re talking about scaling because they’re running into limits of a single node Postgres database. Most of our conversation focuses on scaling the database.

But in all the conversations I’m fascinated to learn about their architecture. Where are they in their journey to micro-services. It has been an interesting trend to see more and more reactions “We’re happy with our monolithic app.” The road to micro-services may work fine for lots, and the benefits may outweigh the bumpy road to get there, but personally give me my monolithic app and a beach somewhere and I’ll be happy.

Why I love building developer products

Tue, 12 Mar 2019 12:55:56 -0800

For much of my career I’ve been focused on building out developer or data focused products with the customer in some form or fashion being a developer on the other end. I fully realize now that I’m destined to spend the rest of my career in that space, either that or trying my hand at wine making. There are a few things that I personally find rewarding about the space that I’ve shared with a number of people individually lately and thought I would share more broadly.

First, software really is eating the world. Often developers and entrepeurs ask about what they could build that would be a good business. The reality is about anything that has not been modernized to as a service and improved with software could be. We have far less developers in the world than we need to execute on all the ways we could improve products and life. To me what is interesting is the last part of that last sentence. It is not that the market for developers is huge, which I do believe it is. It more that when we automate with systems we can get amazing economies of scale. I know folks that reminisce and talk about how we’re more stressed being always connected and such and that in the old days people got out and worked the fields and enjoyed the sun. They also absolutely physically exhausted their bodies in the process. The ability to make life better at scale is an interesting one and often done through systems we develop.

The second reason is the challenge and the reward. Developers are notoriously tough critics. If it feels/smells like marketing then they have an allergic reaction, and that is probably because much of it is done poorly. If they experience really positive/great marketing they latch on more than the average person. Most developers are by nature a bit skeptical… for some reason this resonates with me. But, I’ve found they are the largest/biggest supporters once they’re excited about something. Personally I’d rather folks more critical and then have a few diehard fans than a whole bunch of people I can try to some site with ads and convert them, but then never actually connect with my product/brand.

The third and final reason is, I can understand it. Having worked at small startups, side projects, and at large enterprise I can understand the challenges as each different level. Building a product that I can directly relate to and connect with is some how easier to stay motivated. I’m sure instagram has a lot of interesting technical challenges, or that scaling an ad network is extremely challenging. But building a product that I could see myself directly benefiting from makes everyone’s input on the team valuable and key. This changes the entire dynamic from product team to engineering to marketing.

Now days I dont get much chance to ship code to production, usually that is what I threaten when things are running behind, but at heart I’m still very much an engineer. Building products that make developers lives better and more productive may not change the world, but hopefully it better enables other developers to.

How can I help? East coast vs. West coast mentalities

Sun, 03 Mar 2019 12:55:56 -0800

Often times when I’m traveling on the east coast, whether it is NYC area or back home down south I try to spend some time to catch up with various people. In catching up we’ll spend some time talking about what we’re both up to, thoughts on tech or in general, and at the end I typically ask “Is there anything in particular I can help with?” More often than not the answer to this question isn’t super substantial, which is fine. But what is surprising is the stark contrast on reactions to this question and how it differs from west coast vs. east coast.

On the east coast when I ask “How can I help?” I get a look of:

What’s the catch?
Okay, what is it you want me to do for you?
There is no way you actually intend to help…

Meanwhile on the west coast the mentality is generally quite different. I often have people showing their own willingness to help. It may be someone you know personally that is offering help, but have also seen where someone gets asked for input or help, refers someone else and now you have two people that have known each other for under 5 minutes and one willing to dispense helpful advice or assistance.

The how I can help mentality of the west coast seems to follow that offering assistance is not a zero sum game, and by extending some effort now it hopefully comes back around when they need it themselves.

SQL: One of the most valuable skills

Tue, 12 Feb 2019 12:55:56 -0800

I’ve learned a lot of skills over the course of my career, but no technical skill more useful than SQL. SQL stands out to me as the most valuable skill for a few reasons:

It is valuable across different roles and disciplines
Learning it once doesn’t really require re-learning
You seem like a superhero. You seem extra powerful when you know it because of the amount of people that aren’t fluent

Let me drill into each of these a bit further.

SQL a tool you can use everywhere

Regardless of what role you are in SQL will find a way to make your life easier. Today as a product manager it’s key for me to look at data, analyze how effective we’re being on the product front, and shape the product roadmap. If we just shipped a new feature, the data on whether someone has viewed that feature is likely somewhere sitting in a relational database. If I’m working on tracking key business metrics such as month over month growth, that is likely somewhere sitting in a relational database. At the other end of almost anything we do there is likely a system of record that speaks SQL. Knowing how to access it most natively saves me a significant amount of effort without having to go ask someone else the numbers.

But even before becoming a product manager I would use SQL to inform me about what was happening within systems. As an engineer it could often allow me to pull information I wanted faster than if I were to script it in say Ruby or Python. When things got slow in my webapp having an understanding of the SQL that was executed and ways to optimize it was indespensible. Yes, this was going a little beyond just a basic understanding of SQL… but adding an index to a query instead of rolling my own homegrown caching well that was well worth the extra time learning.

SQL is permanent

I recall roughly 20 years ago creating my first webpage. It was magical, and then I introduced some Javascript to make it even more impressive prompting users to click Yes/No or give me some input. Then about 10 years later jQuery came along and while it was a little more verbose at times and something new to learn it made things prettier overall so I committed to re-learning the jQuery approach to JS. Then it just picked up pace with Angular -> React/Ember, and now I have an entire pipeline to introduce basic Javascript into my website and the reality is I’m still trying to accomplish the same thing I was 20 years ago by having someone click Yes/No.

SQL in contrast doesn’t really change. Caveat: It has changed–there is modern sql, but I’d still argue less dramatically than other language landscapes. Yes we get a new standard every few years and occasionally something new comes along like support for window functions or CTEs, but the basics of SQL are pretty permanent. Learning SQL once will allow you to re-use it heavily across your career span without having to re-learn. Don’t get me wrong I love learning new things, but I’d rather learn something truly new than just yet another way to accomplish the same task.

SQL: Seem better than you are

SQL is an underlearned skill, the majority of application developers just skip over it. Because so few actually know SQL well you can seem more elite than you actually are. In past companies with hundreds of engineers I’d get a question several times a week from junior to principal engineers of: “hey can you help me figure out how to write a query for this?” Because you’re skilled at something so few others are you can help them out which always makes life a little easier when you have a question for them.

So if you’re not already proficient what are you waiting for, do you want to seem like a SQL badass yet?

The biggest mistake Postgres ever made

Tue, 30 Oct 2018 12:55:56 -0800

Postgres has experienced a long and great run. It is over 20 years old and has a track record of being safe and reliable (which is the top thing I care about in a database). In recent years it has become more cool with things like JSONB, JIT support, and a powerful extension ecosystem. But, Postgres has made some mistakes along the way, the most notable being the name.

Postgres gets its name from Ingress. Ingress was one of the first databases and was lead by Michael Stonebreaker who won a Turing award for Postgres and other works. Ingress began in the early 70s at UC Berkeley, which is still to this day known as a top university when it comes to databases. Out of Ingress came a number of databases you’ll still know today such as SQL Server and Sybase. It also as you may have guessed by now spawned Postgres which means Post-Ingress.

In the early days of Postgres there was no SQL. No not NoSQL, there was not SQL. Postgres had its own query language. It wasn’t until 1995 that Postgres received SQL support, and with its addition of SQL support it updated its name to PostgreSQL.

You see, with Postgres becoming PostgreSQL we began a journey of Postgres being mispronounced for its forseeable future and it is still currently the case. Is it really that big of an issue? Well it’s big enough that the PostgreSQL website has a FAQ including “How to pronounce PostgreSQL”. As it stands today there are two generally accepted names:

post-GRES-que-ell
Postgres

With one of the above there is far less confusion. And in fact I’m not the only one to share this opinion. Tom Lane is a major contributor to every Postgres release for more than the last decade. He’s one of the top 10 contributors to open source in general having worked on the JPEG/PNG/TIFF image formats before coming over to database land. Tom has this classic email in the PostgreSQL mailing list:

 [>> Can i get data in postgre from non-postgre db?
> The name is PostgreSQL or Postgres, not postgre.
It might help to explain that the pronunciation is "post-gres" or
"post-gres-cue-ell", not "post-gray-something".
I heard people making this same mistake in presentations at this
past weekend's Postgres Anniversary Conference :-( Arguably,
the 1996 decision to call it PostgreSQL instead of reverting to
plain Postgres was the single worst mistake this project ever made.
It seems far too late to change now, though.
regards, tom lane

The best part is this was mail was 2006, when it was arguably too late to change the name, and here we are in 2018 with the same issue.

Personally I may start calling it Postgre just to emphasize a point, but for the rest of you just going with Postgres is probably a safe choice.

Postgres 11 - A First Look

Thu, 20 Sep 2018 12:55:56 -0800

Postgres 11 is almost here, in fact the latest beta shipped today, and it features a lot of exciting improvements. If you want to get the full list of features it is definitely worth checking out the release notes, but for those who don’t read the release notes I put together a run down of some what I consider the highlight features.

Quitting Postgres

This is a small usability feature, but so long over due. Now you can quit Postgres by simply typing quit or exit. Previously you had to use Ctrl + D or \q. As a begginer it’s one thing to jump into a psql terminal, but once in if you can’t figure out how to quit it’s a frustrating experience. Small usability features, such as this and watch in an earlier release, are often lost in the highlighted features which talk about performance or new data types. Improvements like this really go a long way for making Postgres a better database for everyone.

Fear column addition no more

Brandur had a great in depth write-up on this feature already, but it falls somewhere into the category of the above as well as a performance improvement. Previously when you added a new column that was NOT NULL with a default value Postgres would have to take a lock and re-write the entire table. In a production environment on any sizable table for all practical purposes the result was an outage. The work around was to break your migrations apart to be a several step process.

With Postgres 11 you can add a new column to a table that is not null with a default value. The new row will get materialized on your database without requiring a full re-write. Here is to having to think less about your migrations.

Of course performance is a highlight

No Postgres release would be complete without some performance improvements. This release there are really two areas that feature key improvements around performance.

Parallelism continuing to mature

We first saw parallelism support back in PostgreSQL 9.6. At the time it was primarily for sequential scans, which if you used parallelism for your sequential scans was great, but overall that was a narrow focus. PostgreSQL 10 parallelism because much more useful, and with PostgreSQL 11 it just keeps getting better. Some of the highlights for parallelism include:

Parallel hash joins
Parallel append
Parallel index creation - We’ve talked about how great this can be over on the Citus blog. With it natively in Postgres it makes it even easier for people to leverage.

If you want to dig deeper into all the parallelism support in Postgres this presentation by PostgreSQL committer Thomas Munro at PostgresOpen Silicon Valley from a few weeks ago is a great resource.

Postgres gets a JIT

Just in time compilation is going to be a big deal for Postgres for the coming years. We have the initial support for it now in PostgreSQL 11. Even in this initial implementation of JIT support you can see a nearly 30% speedup on certain queries, such as highlighted here by the TPC-H benchmark

It is still early days for the just in time query compilation support, so expect the improvements here to be even better in PostgreSQL 12 and 13.

Statistics keep getting better

In Postgres 10 we saw a feature that few have probably used CREATE STATISTICS. You see under the covers Postgres keeps a lot of information about your database which it uses to determine the query plan it will use when executing a query. Most statistics were single column ones previously, now with CREATE STATISTICS you could define a correlation between two separate columns. With Postgres 11 now you can create statistics based on expression indexes giving you even more cases where they can help the performance of your app.

Keeping standbys warm

pg_prewarm has been great for warming up a replica’s cache so that should you have a failover you’re not failing over to a cold cache. However up until PostgreSQL 11 you’d have to manually run it yourself or setup some scheduler such as pg_cron, now you can configure pg_prewarm to run all on it’s own at a regular interval.

And more

PostgreSQL 11 is packed with more features than I’ve seen in a release before, though I think I’ve also said that before. It will be exciting to see several of these features such as the JIT support, statistics, and others as it is still in the early days for them. Meanwhile we have a great set of new features to improve user experience as well as help performance with parallelism. If you’re curious to get your hands on these give the beta a try and send your feedback to the PostgreSQL community.

PostgresOpen 2018 - First look at talks

Wed, 27 Jun 2018 12:55:56 -0800

PostgresOpen is just a few months away and our list of talks is now live and available on the PostgresOpen website. This year selecting the talks was the hardest yet not only due to the number of talk submissions, but also the across the board high quality of submissions. There is hopefully something for everyone among the talks, at least if you like Postgres that is.

If you’re thinking about joining us I’d love to see you there and buy you a beer or coffee. The conference is September 5-7 in downtown San Francisco, and early bird tickets are open for just another few weeks. If you want to save some money on tickets grab it and the room now before things jump.

But, if you’re curious for a sampling of a few of the talks I thought I’d break down my top five I’m personally most excited about:

Debugging the Postgres planner

Okay, this one immediately caught my atttention. Melanie will start with the basics of an explain plan to progress down into an actual bug within the Postgres planner, how to can debug it in Postgres, and then write a patch for a fix herself. This talk is well beyond my depth as I’ll likely never contribute code to the Postgres planner, but seems extremely entertaining and likely to highlight both performance profiling as well as useful debugging tips.

Cleaning out Crocodiles teeth with PostgreSQL indexes

I saw Louise give a super practical talk on undertanding explain this year in PgDay Paris. It was both valuable for application developers that aren’t Postgres experts as well as surfaced knowledge for those that thought they already understood explain. I’m excited to hear her take on indexes, but maybe even more excited for the storytelling that will come along with this talk. A talk that can be put to a story always becomes a bit easier to follow the journey than simply the technical facts.

How PostgreSQL extension APIs are changing the face of relational databases

I’ve said it before personally that Postgres is becoming more of a data platform than simply a relational database. Part of that is flexibility towards datatypes and the broad use cases Postgres can support from OLTP, to OLAP, to HTAP. The other big part is extensions! Extensions allow Postgres to continue to advance outside of the standard Postgres core codebase and release cycle. The list of extensions (PostGIS, HyperLogLog, pg_partman, Citus) in this talk is pretty great, and curious to hear more on the APIs themselves.

Connection Pooling 101

Okay, connection pooling has been talked about before. Yet, still SO MANY people don’t use it in production. Connection pooling can have as large of an impact as understanding explain. In this talk Samantha looks at the pros/cons of connection pooling itself with Postgres, and should provide some good guidance for getting things setup.

HOT - Understaning this important optimization update

Grant’s talk last year on tuning Postgres for high write workloads was a great one that covered not only practical tips but some of the details of how things work under the covers. This looks like a great follow-on focusing on heap only tuples and how they can greatly improve things like bloat and overal write throughput.

See you there

If you have any questions about the conference I’d be happy to answer them, though hopefully based on the short sampling of talks you have all the reason you need to join us in September.

Same great Postgres with a new player in town

Tue, 20 Mar 2018 12:55:56 -0800

Many of us have known how great Postgres was for years.

In fact I recall a conversation with some sales engineers about 6 years ago that previously worked for a large database vendor that really no one likes down in Redwood City. They were remarking how the biggest threat to them was Postgres. At first they were able to just brush it off saying it was open source and no real database could be open source. Then as they dug in they realized there was more there than most knew about and they would have to continually be finding ways to discredit it in sales conversations. Well it doesn’t look like those SEs or the rest of that company was too successful.

Postgres is certainly having it’s moment, and I personally don’t expect it to fade soon.

An equally interesting shift I’ve watched from the outside has been Microsoft’s shift to support and engage with the open source movement. Personally that shift is extremely exciting to see, especially today as they announce their general availability of their Postgres offering. And with their announcement it looks they’re not just dabbling but shipping a very compelling offering, notably high availability is built-in which means they’re very much targetting production workloads. With their GA release there are a number of interesting boxes checked:

HIPPA, SOC, ISO compliances
99.99% uptime SLA
Available in 22 regions

Personally I’m looking forward to the new competition for Postgres users as it’ll make Postgres better and better for all. When I was at Heroku and was running product for Heroku Postgres 7 years ago we were the only large major provider. Today that landscape looks a lot different and it just means more choice and more quality if you want to run Postgres. So welcome Microsoft, I look forward to giving Azure Postgres a try.

Postgres hidden gems

Wed, 31 Jan 2018 12:55:56 -0800

Postgres has a rich set of features, even when working everyday with it you may not discover all it has to offer. In hopes of learning some new features that I didn’t know about myself as well as seeing what small gems people found joy in I tweeted out to see what people came back from. The response was impressive, and rather than have it lost into ether of twitter I’m capturing some of the responses here along with some resources many of the features.

@listrophy - $ brew postgresql-update database
- Though personally I prefer Postgres.app ;)
@pat_shaugnessy - ltree
- Pat has a great post that walks through ltree
@billyfung - citext
- A really handy datatype for case insensitive text
@eeeebbbbrrrr - date math with intervals
- I couldn’t agree more on this one, working with time in Postgres is the easiest time I’ve every had
@DataMiller - The jsonb datatype and lateral joins
- I’d argue it’s hard to claim now JSONB is a hidden gem, but lateral joins are certain a great one
@ideasasylum - row_number() over(partition http://orders.site_id order by orders.created_at)
- Window functions are definitely a handy feature was my hidden (to me) discovery this week
@franckverrot - Index access method, and custom FDWs
@jonjensen0 - Set-returning functions and custom aggregate functions can be very helpful.
@ascherbaum - psql -x
- Psql is indeed awesome and can be well tuned
@Abstr_ct - The fact that the docs are fantastic and all hidden gems are actually readily available. Oh, and pl/brainfuck obviously
@Halpin_IO - Subnetting and network operations
@jkatz05 - Replication slots, both physical and logical. They’ve made setting up replication infinitely easier. And range types. Because they’re awesome.
@petereisentraut - Unicode table borders
@_avichalp Notify/listen
@simonw - The fact that GIN indices can make LIKE queries run fast even if the % isn’t just at the end of the string
@javisantana - it has a statistics system to plan queries that can be used by the user when accuracy does not matter, for example, use “explain select * from table” to replace count() or use “_postgis_selectivity” to know how many points fall into a bbox.
@l_avrot - The fact that we can use vim editor in psql
@mashd - Logical decoding for change data capture.
@4thdoctor_scarf - the MVCC. If I had a penny per each time I’ve explained how really works, I’ll be a millionaire now :)
@docteur_klein - \timing in psql
@thibaut_barrere - From times to times I find foreign data wrapper with CSV files very helpful (& easy to setup with Ruby’s Sequel library) https://gist.github.com/thbar/0093ee54c5a61aa5a0c5a4737fc3bd45
@steve_touw - Foreign data wrappers
@roimartinez_gis - Clearly aggregate functions make live very simple :) .
@whalesalad - select where datetime > yesterday and other natural language time queries.
@pwramsey - At the hacker level: hooks. So many cool hooks, and finding them, a bit of an easter egg hunt.
@pwramsey - At the user level: the quality and breadth of tsearch still feels radically under appreciated; same for ranges.
@TomCiopp - PostGIS / PgRouting
@peterbe - psql -l
@jbrancha - In psql, setting ‘\x auto’ so that wide table results get displayed vertically!
@westermanndanie - \watch

Well that was quite the list. And I’m sure we’ve only scratched the surface. Have something not on the list that you feel like classifies as a hidden gem? Lets hear about it

Sourcing developer marketing content

Thu, 28 Dec 2017 12:55:56 -0800

I spend a lot of time with dev tool and data companies. I think I’ve more or less banished myself to a life of working in the space, no consumer products for me. In that world a common topic that comes up amongst marketing teams is how do I get my team to contribute to content? Sometimes the person already has an idea of how they want the team to jump onto the bandwagon of their plan, sometimes they’re entirely open minded. I won’t get into pros and cons of various approaches here, rather after sharing some of my approaches in one on one settings I thought it could be useful to share more broadly here.

Turn emails into blogs

I’m a big fan of high quality content when it comes to developer focused products. Yes, you can publish x vs. y with FUD and get some traction with it. You can publish high level customer stories that don’t get down into the details. But publishing high quality deep technical content that other engineers appreciate gives you a huge head start in getting buy in from your customers. The best thing about much of this type of content is you likely already have it sitting within your organization.

Engineers spend a lot of time being thorough and articulate in their emails to their peers. If you see a well written email from one engineer to the engineers@ list and then others chime in either with questions, follow-up, or praise then it’s likely a great candidate for a blog post. We recently had one of these at Citus where another engineer replied with “Nice blog post :)”, a few weeks later we had a post for the rest of the world to read.

The thing about emails though is the engineers won’t usually take and turn them into a blog post. Get a technically minded person that likes to write and put some framing around it, pull out relevant details, and collaborate with the engineer. I’ve often found going from one of these emails to a fully published blog post can be anywhere from 2-8 hrs (including reviews and edits).

Beginner mindset with tickets

As your product people and engineers spend more and more time with the product they become numb to the cool feature from 2 years ago. That initial awe is just expected. Yes there are new advanced features and tech that is being built… but most of your users aren’t the advanced power users. You still need to speak to the people just now onboarding and discovering you.

The best way to maintain this beginner mindset is to capture the interactions with those that are newer to your product. Over the years I’ve made the practice of cataloging support tickets and the type of issue encountered. Any time the same issue is seen several times it becomes a candidate for a blog post or at the very least being clearly documented.

Make sales engineering obsolete

Your sales engineers spend weeks with customers helping them implement complex solutions with your product. In a lot of cases they get really good at helping re-implement a similar solution over and over. Similar to your process with support tickets they’re a great source to take what they’ve done a few times over and turn it into a blog post.

Usually after the first time implementing they have some ideas but there are rough edges. The second or third time they’ve helped a customer implement a particular approach it becomes a candidate for a post.

Training and Documentation don’t have to be siloed

I’m a big fan of not doing anything for one reason. You should absolutely document your product, and if you need to do training to support customers go for it. But that doesn’t mean those bits of content can’t be re-used. Documentation is a great place to take the factual how something works and then give it some narration of the end to end experience in a blog post.

The process

If you’re asking the question how do I get more good content out there, then you’re not leveraging the content you’re already sitting on top of. Though perhaps equal is getting a process in place. A few tips for that:

Don’t blanket ask to a list for volunteers, ask individuals and about specific posts. If you see a great email, respond and ask if you can work with them to get it turned into a blog post
You’ll need to navigate the process, and make sure to have a process
Separate messaging/flow from grammar
My typical process is: outline -> draft -> feedback from 2-3 on draft -> finalize draft -> feedback from 4-5 (often some external parties) -> publish

What’d I miss?

Have other tips that are key to the process? Know of other places where content may already exist but isn’t being leveraged? Would love to hear your thoughts.

Guidance on performing retrospectives

Tue, 26 Dec 2017 12:55:56 -0800

In my career I’ve had to conduct a number of retrospectives. Ahead of them it already sucked, there was an outage at some point, customers were impacted, and it was our fault. Never was it solely on our underlying infrastructure provider (AWS or Heroku), nope the blame was on us and we’d failed in some way. And as soon as the incident was resolved, it wasn’t time to go home and decompress with a beer, it was time start the process of a retrospective.

Finding the motivation to get right back to work is tough, but not losing time is important. There is probably a lot out there on retrospectives, and in general I was well rehearsed at them. But since I’d not performed a large scale one in a few years I found myself rusty and thought it’d be good to share some of our process.

Capture details immediately

It may not be clear if you’ve not been involved in many, but a retrospective is more than just a meeting to discuss what happened and how to fix it. It’s an overall process, it begins with capturing thorough details of what happened. The start is a timeline. The best thing to do is capture the details while they’re fresh. Start with a google doc and simply document the timeline of everything. Capture chat logs that are relevant while they’re fresh in history and easy to find.

The start of an outage likely wasn’t the start of the timeline, there may have been something that happened days, weeks, or even years ago. Don’t just start from the time things went offline, go back to the causes as much as possible. If code was committed a year ago that was the offender make sure to note that.

Running the retrospective (the meeting part)

There are a number of various good practices for running the retrospective itself. There are also a lot of different formats, all valid each with their own pros and cons. You can do with a basic timeline, what went well/didn’t, do a five whys analysis. I tend to prefer a clean and dry analysis of timeline, what went well and what didn’t and what we’re doing about it.

Some key tips to help a retrospective meeting be productive:

Explicitly set time bounds for each activity ahead of time, more so than maybe any other meeting it’s important to get through all your agenda. Hard time limits on the planned items is how you accomplish this.
Spend at least some time on what went well, retrospectives aren’t fun. Spending some time on the good parts of your process and response isn’t wasted, just don’t be overly self-back-patting (that’s not a word but you get it).
Don’t discuss people, or rather don’t point fingers. Yes, people will come up, but it’s about the technical and process errors not the person that performed them.

The important part

Every step in the retrospective is important, but the goal of them all is to get to how you can improve. With any retrospective there are likely two categories of improvements that will surface. The first is bugs that caused the issue or engineering that could go in place to help with the specific issue. The second are process improvements. If you don’t have improvements in both areas then spend more time thinking on the one you’re missing.

Improvements shouldn’t be isolated to the exact issue you saw. Yes you may see the exact same issue again, but there is also a lot more you can draw out that helps improve overall quality. It’s inevitable you’ll see different issues in the future, thinking of how you can improve your systems and processes to catch those future issues is time well spent.

You’ve done the hard work in the above, but it’s still good to share publicly and transparently. Within your public retrospective I tend to follow:

Apologize, and mean it
Show a firm understanding of your systems, and communicate the problem. Don’t try to be fancy technically, but don’t be too highlevel. Think goldilocks.
Share what you’re doing to improve

Credit to Mark for being the first person I’m aware of to lay it out like the above

Thanks to Mark Imbriaco, Blake Gentry, Daniel Farina, Lukas Fittl, Will Leinweber for input and feedback along the way.

Looking for more resources on the topic? Make sure to check out these two talks:

Postgres - the non-code bits

Tue, 31 Oct 2017 12:55:56 -0800

Postgres is an interesting open source project. It’s truly one of a kind, it has its own license to prove it as opposed to falling under something like Apache or GPL. The Postgres community structure is something that is pretty well defined if you’re involved in the community, but to those outside it’s likely a little less clear. In case you’re curious to learn more about the community here’s a rundown of a few various aspects of it:

PostgreSQL License

Let’s start with the legal part first. First IANAL. PostgreSQL is under its own license. For those who don’t regularly follow software licensing it’s extremely liberal and flexible. You can take Postgres, fork it, change it, package it up, and resell it. This is actually one of the reasons you see Postgres at the core of so many other databases like Par Accel, Asterdata, etc. That and that it’s such a solid code base that is capable of being extended. I once heard someone describe how they don’t really like writing C, but they enjoy writing Postgres C ;)

A thing you can’t do is profit off the PostgreSQL logo without any approval from the core team.

The people

Within the Postgres community there are 2 major sets of people.

The core team

The core team is a smaller team within the Postgres community. The core team is effectively a steering committee for Postgres. They’re responsible for coordinating releases, handling confidential issues (read: security issues), managing permissions around PostgreSQL code and infrastructure, defining policy.

The core team is a very small list of people, at the moment 5 individuals.

Contributors

Yes, anyone can contribute to Postgres, and with each release there are a laundry of people that write some code that goes into Postgres. If fact Postgres 10 had 325 people that contributed in some form. That said there is a hierarchy that exists. The two biggest ones are committers and major contributors.

Committers gain access after years of contributing to Postgres showing sustained commitment to the project. New committers are voted on each year at PgCon which happens in March/April in Ottawa. If you’re ever curious for a conference of what’s coming and being deep in the internals of Postgres it’s one to check out. Once you do gain your commit bit you’re expected to contribute every couple of years to the project. And of course there are a number of qualifications such as contributing high quality code and perhaps most key is helping review others contributions.

Major contributors are another notable group. Major contributors don’t have full sole commit access, but are held in a higher regard from consistently contributing major features as well as providing review for others.

Contributors in general are another area worth calling out. While they may not have a flagship feature to their name like the major contributors, Postgres is what it is because of the contributions of everyone.

PostgreSQL the company

Well, it turns out there isn’t a company behind Postgres, it’s one thing that makes it unique–no one can ever “own it”. There are some official PostgreSQL non-profits though in particular the US non-profit and the EU non-profit. These non-profits ensure that the core guidelines are enforced and also give coverage for the community to help put on official community conferences. A few of these happen each year which include:

PGCon - The hackers conference
PostgresOpen SV - A consolidation of PostgresOpen and PGConf Silicon Valley
PGConf EU - The largest PG European Conference which moves around each year

If you’re looking for a way to support the PostgreSQL non-profit organization I’d encourage you to consider joining PostgreSQL.us.

Engaging

So you want to jump into the community, where do you even start? The first place I’d encourage is to subscribe to the mailing list or check out the slack channel. The users mailing is a great one to just jump in and help answer questions and see what people need help with. The hackers list is where you go to get a peek at all the fun debates/discussions/development.

If you’re thinking about contributing it’s a good idea to lurk on the hackers list for a bit first. Then when the commitfest comes chip in and help review some patches and do some testing. Oh and of course, you can always blog about what you’re doing with Postgres and will aim to get it included into Postgres Weekly.

Dear Postgres

Thu, 12 Oct 2017 12:55:56 -0800

Dear Postgres,

I’ve always felt an affinity for you in my 9 years of working with you. I know others have known you longer, but that doesn’t mean they love you more. Years ago when others complained about your rigidness or that you weren’t as accommodating as others I found solace in your steadfast values:

Don’t lose data
Adhere to standards
Move forward with a balancing act between new fads of the day while still continuously improving

You’ve been there and seen it all. Years ago you were being disrupted by XML databases. As companies made heavy investment into what such a document database would do for their organization you proceeded to “simply” add a datatype that accomplished the same and brought your years of progress along with it.

In the early years you had the standard format of index b-tree that most database engines leveraged. Then quietly but confidently you started adding more. Then came K-nearest neighbor, generalized inverted indexes (GIN), and generalized search-tree (GiST), only to be followed by space partitioned GiST and block range indexes (BRIN). Now the only question is which do I use?

All the while there was this other camp using for something that felt cool but outside my world: GIS. GIS, geographical information systems, I thought was something only civil engineers used. Then GPS came along, then the iPhone and location based devices came along and suddenly I wanted to find out the nearest path to my Peets, or manage geographical region for my grocery delivery service. PostGIS had been there all along building up this powerful feature set, sadly to this day I still mostly marvel from the sideline at this whole other feature set I long to take advantage of… one day… one day.

A little over 5 years ago I fell in love with your fastly improving analytical capabilities. No you weren’t an MPP system yet, but here came window functions and CTEs, then I almost understood recursive CTEs (still working on that one). I can iterate over data in a recursive fashion without PL/PgSQL? Yes please! I only want to use it more.

And then five years ago, document stores start taking over the world. I feel like I’ve seen this story before, wasn’t XML going to change the internet? Enter JSON, the JSON datatype, and JSONB. Wow, this is really nice to mix relational, document storage, join against things. I suddenly don’t get why more don’t take this flexible approach to building on a good foundation and layering on the refinements.

Extensions! Where have you been all my life? There’s Citus, and HyperLogLog, and ZomboDB, with each I can add functionality to Postgres without it being limited to the standard release, they can be in C or not. Wait, all along so much has been built on this foundation? PostGIS, full-text search, hstore? I like all those things, why didn’t you tell me all along about this foundation? Postgres, I like what I’m seeing how you’re allowing others to do more without having it be in the core of Postgres. This extension stuff is really kinda cool that it’s Postgres and then some, kinda like C and then ++, wait nevermind scratch that analogy.

Sorry, I’ve rambled a bit. You’re a little over twenty years old now. I’ve known you for nearly ten of those years so I know there’s so much about your background I don’t know, I hope we get to spend the time together to share it all. This 10 release is really an exciting one to me. We’ve spent all this time together and I feel like each passing year the bond grows fonder.

Now you’ve brought me better parallelism so I can further utilize my system resources. I now have partitioning. Thank you! I don’t have to roll my own hacks to help age out old data for my time series database. Logical replication will make so many other things possible, such as more online upgrades and integration with other systems.

Postgres, I just want to say thank you for the past ten years together. Thank you for all you’ve done and for all you’ll continue to do in the future.

Tracking and managing your Postgres connections

Mon, 18 Sep 2017 12:55:56 -0800

Managing connections in Postgres is a topic that seems to come up several times a week in conversations. I’ve written some about scaling your connections and the right approach when you truly need a high level of connections, which is to use a connection pooler like pgBouncer. But what do you do before that point and how can you better track what is going on with your connections in Postgres?

Postgres under the covers has a lot of metadata about both historical and current activity against a system. Within Postgres you can run the following query which will give you a few results:

SELECT count(*),
 state
FROM pg_stat_activity
GROUP BY 2;
 count | state
-------+-------------------------------
 7 | active
 69 | idle
 26 | idle in transaction
 11 | idle in transaction (aborted)
(4 rows)

Time: 30.337 ms

Each of these is useful in determining what you should do to better manage your connection count. All of these numbers can be useful to record every say 30 seconds and chart on your own internal monitoring. Lets break down each:

active - This is currently running queries, in a sense this is truly how many connections you may require at a time
idle - This is where you have opened a connection to the DB (most frameworks do this and maintain a pool of them), but nothing is happening. This is the one area that a connection pooler like pgBouncer can most help.
idle in transaction - This is where your app has run a BEGIN but it’s now waiting somewhere in a transaction and not doing work.

For idle as mentioned above it’s one that you do want to monitor and if you see a high number here it’s worth investing in setting up a pgBouncer.

For idle in transaction this one is a bit more interesting. Here what you likely want to do when first investigating is get an idea of how old those are. You can do this by querying pg_stat_activity and filtering for where the state is idle in transaction and checking how old those queries are. For ones that have been running too long you may want to manually kill them.

If you find that you have some stale transactions hanging around this could be for days, hours, or even just a few minutes you may want to set a default to kill those transactions.

To help with this Postgres has a nice feature of a statement_timeout. A statement timeout will automatically kill queries that run longer than the allotted time. You can set this at both a global level and for a specific session. To do this at the database level you’d run this with an alter database dbnamehere set statement_timeout = 60000; which is 60 seconds. To do so during a given session simply run set statment_timeout = 6000000;.

For idle in transaction that have been running too long there is its own setting setting that you can set in a similar fashion idle_in_transaction_session_timeout (on Postgres 9.6 and up). Setting both statement_timeout and idle_in_transaction_session_timeout will help with cancelling long running queries and transactions.

Keeping your connection limits in check should lead to a much healthier performing database and thus app.

Better database migrations in Postgres

Sun, 10 Sep 2017 12:55:56 -0800

As your database grows and scales there are some operations that you need to take more care of than you did when you were just starting. When working with your application in your dev environment you may not be fully aware of the cost of some operations until you run them against production. And at some point most of us have been guilty of it, running some migration that starts at 5 minutes, then 15 minutes in it’s still running, and suddenly production traffic is impacted.

There are two operations that tend to happen quite frequently, each with some straightforward approaches to mitigate having any noticable amount of downtime. Let’s look at each of the operations, how they work and then how you can approach them in a safer way.

Adding new columns

Adding a new column is actually quite cheap in Postgres. When you do this it updates its underlying tracking of the columns that exist–which is almost instant. The part that becomes expensive is when you have some constraint against the column. A constraint could be a primary or foreign key, or some uniqueness constraint. Here Postgres has to scan through all the records in the table to ensure that it’s not being violated. Adding some constraint such as not null does happen some, but is not the most common cause.

The most common reason for slowness of adding a new column is that most frameworks make it very simple for you to set a default value for the new column. It’s one thing to do this for all new records, but when you do this when an existing table it means the database has to read all the records and re-write them with the new default value attached. This isn’t so bad for a table with a few hundred records, but for a few hundred million run it then go get yourself coffee, or lunch, or a 5 course meal because you’ll be waiting for a while.

In short, not null and setting a default value (on creation) of your new column will cause you pain. The solution is to not do those things. But, what if you want to have a default value and don’t want to allow nulls. There’s a few simple steps you can take, by essentially splitting your migration up from 1 step to 4 migrations:

Add your new column that allows nulls
Start writing your default value on all new records and updates
Gradually backfill the default value
Apply your constraint

Yes, this is a little more work, but it doesn’t impact production in nearly the same magnitude.

Indexes

Index creation like most DDL operations holds a lock while it’s occurring, this means any new data has to wait for the index to be created and then the new writes flow through. Again when firsting creating the table or on a small table this time is not very noticable. On a large database though, you can again wait minutes to possibly even hours. It’s a bit ironic when you think about it that adding an index to speed things up can slow things down while it’s happening.

Postgres of course has the answer for this with CONCURRENT index creation. What this does is gradually build up the index in the background. You can create your index concurrently with: CREATE INDEX CONCURRENTLY. As soon as the index is created and available as long as you did what you were hoping to Postgres will swap over to using it on queries.

A tool to help

It’s a good practice to understand what is happening when you run a migration and its performance impact. That said you don’t have to manage this all on your own. At least for Rails there’s a tool to help enforce more of these as you’re developing to catch it earlier. Strong migrations aims to catch many of these expensive operations for you to have your back, if you’re on Rails consider giving it a look.

Have other tools or tips that can help with database migrations in Postgres? Drop me a note and I’ll work to add them to the list.

Postgres backups: Logical vs. Physical an overview

Sun, 03 Sep 2017 12:55:56 -0800

It’s not a very disputed topic that you should backup your database, and further test your backups. What is a little less discussed, at least for Postgres, is the types of backups that exist. Within Postgres there are two forms of backups and understanding them is a useful foundation for anyone working with Postgres. The two backup types are

Physical: which consist of the actual bytes on disk,
Logical: which is a more portable format.

Let’s dig into each a bit more so you can better assess which makes sense for you.

Logical backups

Logical backups are the most well known type within Postgres. This is what you get when you run pg_dump against a database. There are a number of different formats you can get from logical backups and Postgres does a good job of making it easy to compress and configure this backup how you see fit.

When a logical backup is run against a database it is not throttled, this introduces a noticable load on your database.

As it’s reading the data from disk and generating (in layman terms) a bunch of SQL INSERT statements, it has to actually see the data. It’s of note that older Postgres databases (read: prior to 9.3) there were no checksums against your database. Checksums are just one tool for you to help check against data corruption. Because a logical dump has to actually read and generate the data to insert it will discover any corruption that exists for you.

This portable format is also very useful to pull down copies from production to different environments. I.e. if you need a copy of production data down on your local laptop pg_dump is the way to do it. Logical backups are also database specific, but then allow you to dump only certain tables.

All in all logical backups bring some good features, but come at two cost:

Load on your system
The backup contains data as of the time when it ran

Physical backups

Physical backups are another option when it comes to backing up your database. As we mentioned earlier it is the physical bytes on disk. To understand physical backups we need to know a bit more under the covers about how Postgres works.

Postgres, under the covers, is essentially one giant append only log. When you insert data it gets written to the log known as the write-ahead log (commonly called WAL). When you update data a new record gets written to the WAL. When you delete data a new record gets written to the WAL. Nearly all changes in Postgres including to indexes and otherwise cause an update to the WAL.

With physical backups what you require to be able to create a restore of your database is two things:

A base backup, which is a copy of the bytes on disk as of that point and time
Additional segments of the WAL to put the database in some consistent state.

A physical backup only requires a small amount of WAL to restore the database to some valid state, but this also gives you some new flexibility. With a base backup plus WAL you can start to replay transactions up to a specific point in time. This is often how point-in-time recovery is performed within Postgres. If you accidentally drop a table, yes… it happens, you can:

Find a base backup before you dropped the table
Restore that base backup
Replay wal segments up to roughly that time just before you dropped the table.

If you’re considering setting up physical backups, consider using a tool like WAL-G to help.

Logical vs. Physical which to choose

Both are useful and provide different benefits. At smaller scale, say under 100 GB of data logical backups via pg_dump are something you should absolutely be doing. Because backups happen quickly on smaller databases you may be able to get out without functionality like point-in-time recovery. At larger scale, as you approach 1 TB physical backups start to become your only option. Because of the load introduced by logical backups and the time lapse between capturing them they become less suitable for production.

Hopefully this primer helps provide a high level overview of the two primary types of backups that exist as options for Postgres. Of course there is much deeper you can go on each, but consider ensuring you have at least one of the two if not both in place. Oh and make sure to test them, an un-tested backup isn’t a backup at all.

Postgres Open Silicon Valley line-up: First take

Sat, 01 Jul 2017 12:55:56 -0800

This year Postgres open and PGConf SV have combined to great a bigger and better conference right in downtown San Francisco. I’m obviously biased as I’m one of the co-chairs, and I know every conference organizer says picking the talks was hard, but I’m especially excited for the line-up this year. The hard part for me is going to be which talks do I miss out on because I’m sitting in the other session that’s ongoing. You can see the full list of talk and tutorial sessions, but I thought it’d be fun to do a rundown of some of my favorites.

How Postgres could index itself

Postgres indexing itself has long been on my wishlist. Andrew Kane from Instacart, and creator of PgHero has bottled up many learnings into a new tool: Dexter. I suspect we’ll get a look at all that went into this, how it works, and how you can leverage it to have a more automatically tuned database.

Scaling a SaaS Application Beyond a Single Postgres with Citus

Migration talks are all to common, from Postgres to MySQL from MySQL to Postgres, or from Dynamo to Postgres. But this one is a little different flavor from Postgres to sharded Postgres with Citus. Sharding into a distributed system of course brings new things to consider and think about, and here you’ll learn about them from first hand experience so hopefully you can avoid mistakes yourself.

Concurrency Deep Dive

This one looks to be a great under the hood look as well as likely very practical. It’ll cover MVCC which is really at so much of the core of how Postgres works, but then bring it up to what it means for things like locks. Best of all, this one like so many others comes with lots of real world experience from Segment.

Postgres window magic

Running PostgreSQL @ Instagram

Instagram is well known as one of the largest apps in the world. They optimized and changed their setup multiple times and probably scaled in about every way possible. Here we get to learn about all the various things you need in running at a truly astonishing scale.

Many many more

Of course there’s many more. Talks range from looks at new features, to how certain companies are using Postgres. We’ve got companies like Instacart and Instagram as mentioned giving talks, to Postgres core committers. Whether you want to learn about the inner workings of Postgres (which often hurts my brain) to how you can simply speed up your app you should find something you like, as long as you like Postgres that is. Take a look at the full list of sessions and we hope to see you there.

Working with time in Postgres

Thu, 08 Jun 2017 12:55:56 -0800

A massive amount of reporting queries, whether really intensive data analysis, or just basic insights into your business involving looking at data over a certain time period. Postgres has really rich support for dealing with time out of the box, something that’s often very underweighted when dealing with a database. Sure, if you have a time-series database it’s implied, but even then how flexible and friendly is it from a query perspective? With Postgres there’s a lot of key items available to you, let’s dig in at the things that make your life easier when querying.

Date math

The most common thing I find myself doing is looking at users that have done something within some specific time window. If I’m executing this all from my app I can easily inject specific dates, but Postgres makes this really easy for you. Within Postgres you have a type called an interval that is some window of time. And fortunately Postgres takes care of the heavy lifting of how might something translate to or from hours/seconds/milliseconds/etc. Here’s just a few examples of things you could do with interals:

‘1 day’::interval
‘5 days’::interval
‘1 week’::interval
‘30 days’::interval
‘1 month’::interval

A note that if you’re looking to remove something like a full month, you actually want to use 1 month instead of trying to calculate yourself.

With a given interval you can easily shift some window of time, such as finding all users that have signed up for your service within the past week:

SELECT *
FROM users
WHERE created_at >= now() - '1 week'::interval

Date functions

Date math makes it pretty easy for you to go and find some specific set of data that applies, but what do you do when you want a broader report around time? There’s a few options here. One is to leverage the built-in Postgres functions that help you work with dates and times. date_trunc is one of the most used ones that will truncate a date down to some interval level. Here you can use the same general values as the above, but simply pass in the type of interval it will be. So if we wanted to find the count of users that signed up per week:

SELECT date_trunc('week', created_at),
count(*)
FROM users
GROUP BY 1
ORDER BY 1 DESC;

This gives us a nice roll-up of how many users signed up each week. What’s missing here though is if you have a week that has no users. In that case because no users signed up there is no count of 0, it just simply doesn’t exist. If you did want something like this you could generate some range of time and then do a cross join with it against users to see which week they fell into. To do this first you’d generate a series of dates:

SELECT generate_series('2017-01-01'::date, now()::date, '1 week'::interval) weeks

Then we’re going to join this against the actual users table and check that the created_at falls within the right range.

with weeks as (
select week
from generate_series('2017-01-01'::date, now()::date, '1 week'::interval) week
)
SELECT weeks.week,
count(*)
FROM weeks,
users
WHERE users.created_at > weeks.week
AND users.created_at <= (weeks.week - '1 week'::interval)
GROUP BY 1
ORDER BY 1 DESC;

Timestamp vs. Timestamptz

What about storing the times themselves? Postgres has two types of timestamps. It has a generic timestamp and one with timezone embedded in it. In most cases you should generally opt for timestamptz. Why not timestamp? What happens if you move a server, or your server somehow swaps its configuration. Or perhaps more practically what about daylight savings time? In general you might think that you can simply just put in the time as you see it, but when different countries around the world observe things like daylight savings time differently it introduces complexities into your application.

With timestamptz it’ll be aware of the extra parts of your timezone as it comes in. Then when you query from one timezone that accounts for daylights savings you’re all covered. There’s a number of articles that cover a bit more in depth on the logic between timestamp and timestamp with timezone, so if you’re curious I encourage you to check them out, but by default you mostly just need to use timestamptz.

There’s a number of other functions and capabilities when it comes to dealing with time in Postrges. You can extract various parts of a timesetamp or interval such as hour of the day or the month. You can grab the day of the week with dow. And one of my favorites which is when we celebrate happy hour at Citus, there’s a literal for UTC 00:00:00 00:00:00 which is allballs(). If you need to work with dates and times in Postgres I encourage you to check out the docs before you try to re-write something of your own, chances are what you need may already be there.

Why use Postgres (Updated for last 5 years)

Sun, 30 Apr 2017 12:55:56 -0800

Five years ago I wrote a post that got some good attention on why you should use Postgres. Almost a year later I added a bunch of things I missed. Many of those items bear repeating, and I’ll recap a few of those in the latter half of this post. But in the last 4-5 years there’s been a lot of improvements and more reasons added to the list of why you should use Postgres. Here’s the rundown of the things that make Postgres a great database you should consider using.

Datatypes, including JSONB and range types

Postgres has long had an open and friendly attitude for adding datatypes. It’s had arrays, geospatical and more for some time. A few years ago it got two datatypes worth thinking about using:

JSONB

JSONB is a binary representation of JSON. It’s capable of being indexed on with GIN and GIST index types. You can also query into your full JSON document for quick lookups.

Range types

While it didn’t arrive to the same fame as JSONB, range types can be especially handy if they’re what you need. Within a single column you can have a range from one value to another–this is especially helpful for time ranges. If you’re building a calendaring application or often have a from and to of timestamps then range types can let you put that in a single column. The real benefit is that you can then have constraints that certain time stamps can’t overlap or other constraints that may make sense for your application.

Extensions

It’d be hard to talk about Postgres without all the ecosystem around it. Extensions are increasingly quite key when it comes to the community and growth of Postgres. Extensions allow you to hook into Postgres very natively without requiring them to be committed back to the core of Postgres. This means they can add rich functionality without being tied to a Postgres release and review cycle. Some great examples of this are:

Citus

Citus (who I work for) turns Postgres into a distributed database allowing you to easily shard your database across multiple nodes. To your application it still looks like a single database, but then under the covers it’s spread across multiple physical machines and Postgres instances.

HyperLogLog

This is a personal favorite of mine that allows you to easily have close-enough distinct counts pre-aggregated, but then also do various operations on them across days such as unions, intersections, and more. HyperLogLog and other sketch algorithms can be extremely common across large datasets and distributed systems, but it’s especially exciting to find them pretty close to out of the box in Postgres.

PostGIS

PostGIS isn’t new, but it’s worth highlighting again. It’s commonly regarded as the most advanced geospatial database. PostGIS adds new advanced geospatial datatypes, operators, and makes it easy to do many of the location based activities you need if you’re dealing with mapping or routing.

Logical replication

For many years the biggest knock against Postgres was the difficulty in setting up replication. Originally this was any form of replication, but then streaming replication came along (this is streaming of the binary WAL or write-ahead-log format). Tools like wal-e help leverage much of the Postgres mechanisms for things like disaster recovery.

Then we had the foundation for logical replication in recent releases, though it still required an extension to Postgres so it wasn’t 100% out of the box. And, then finally we got full logical replication. Logical replication allows the sending of more or less actual commands, this means you could replicate only certain commands or certain tables.

Scale

In addition to all of the usability featuers we’ve seen Postgres continue to get better and better at performance. In particular we now have the foundations for parallelism and on some queries you’ll see much better performance. Then if you need even greater scale than single node Postgres (such as 122 or 244 GB of RAM on RDS or Heroku) you have options like Citus which was mentioned earlier that can help you scale out.

Richer indexing

Postgres already had some pretty powerful indexing before with GIN and GiST, those are now useful for JSONB. But we’ve also seen the arrival of KNN indexes and Sp-GiST and have even more on the way.

Upsert

Upsert was a work in progress for several years. It was one of those features that most people hacked around with CTEs, but that could create race conditions. It was also one of the few features MySQL had over Postgres. And just over a year ago we got official upsert support.

Foreign Data Wrappers

Okay, yes foreign data wrappers did exist many years ago. If you’re not familiar with foreign data wrappers, they allow you map an external data system to tables directly in Postgres. This means could could for example interact and query your Redis database from directly in Postgres with SQL. They’ve continued to be improved more and more from what we had over 5 years ago. In particular we got support for write-able foreign data wrappers, meaning you can write data to other systems from directly in Postgres. There’s also now an official Postgres FDW which comes out of the box with Postgres and it by itself is quite useful when querying across various Postgres instances.

Much more

And if you missed the earlier editions of this, please feel free to check them out. The cliff notes of them include:

Window functions
Functions
Custom languages (PLV8 anyone?)
NoSQL datatypes
Custom functions
Common table expressions
Concurrent index creation
Transactional DDL
Foreign Data Wrappers
Conditional and functional indexes
Listen/Notify
Table inheritance
Per transaction synchronous replication

Getting started with JSONB in Postgres

Sun, 12 Mar 2017 12:55:56 -0800

JSONB is an awesome datatype in Postgres. I find myself using it on a weekly basis these days. Often in using some API (such as clearbit) I’ll get a JSON response back, instead of parsing that out into a table structure it’s really easy to throw it into a JSONB then query for various parts of it.

If you’re not familiar with JSONB, it’s a binary representation of JSON in your database. You can read a bit more about it vs. JSON here.

In working with JSONB here’s a few quick tips to get up and running with it even faster:

Indexing

For the most part you don’t have to think to much about this. With Postgres powerful indexing types you can add one index and have everything within the JSON document, all the keys and all the values, automatically indexed. The key here is to add a GIN index. Once this is done queries should be much faster where you’re searching for some value:

CREATE INDEX idx_data ON companies USING GIN (data);

Querying

Querying is a little bit more work, but once you get the basics it can be pretty straight forward. There’s a few new operators you’ll want to quickly ramp up on and from there querying becomes easy.

For the most basic part you now have an operator so traverse down the various keys. First let’s get some idea of what the JSON looks like so we can have something to work with. Here’s a sample set of data that we get back from Clearbit:

{
"domain": "citusdata.com",
"company": {
"id": "b1ff2bdf-0d8d-4d6d-8bcc-313f6d45996a",
"url": "http:\/\/citusdata.com",
"logo": "https:\/\/logo.clearbit.com\/citusdata.com",
"name": "Citus Data",
"site": {
"h1": null,
"url": "http:\/\/citusdata.com",
"title": "Citus Data",
},
"tags": [
"SAAS",
"Enterprise",
"B2B",
"Information Technology & Services",
"Technology",
"Software"
],
"domain": "citusdata.com",
"twitter": {
"id": "304455171",
"bio": "Builders of Citus, the extremely scalable PostgreSQL database.",
"site": "https:\/\/t.co\/hKpZjIy7Ej",
"avatar": "https:\/\/pbs.twimg.com\/profile_images\/630900468995108865\/GJFCCXrv_normal.png",
"handle": "citusdata",
"location": "San Francisco, CA",
"followers": 3770,
"following": 570
},
"category": {
"sector": "Information Technology",
"industry": "Internet Software & Services",
"subIndustry": "Internet Software & Services",
"industryGroup": "Software & Services"
},
"emailProvider": false
}
}

Sorry it’s a bit long, but it gives us a good example to work with.

Basic lookups

Now let’s query something fairly basic, the domain:

# SELECT data->'domain'
FROM companies
WHERE domain='citusdata.com'
LIMIT 1;

 ?column?
-----------------
 "citusdata.com"

The -> is likely the first operator you’ll use in JSONB. It’s helpful to traverse the JSON. Though of you’re looking to get the value as text you’ll actually want to use ->>. Instead of giving you some quoted response back or JSON object you’re going to get it as text which will be a bit cleaner:

# SELECT data->>'domain'
FROM companies
WHERE domain='citusdata.com'
LIMIT 1;

 ?column?
-----------------
 citusdata.com

Filtering for values

Now with something like clearbit you may want to filter out for only certain type of companies. We can see in the example data that there’s a bunch of tags. If we wanted to find only companies that had the tag B2B we could use the ? operator once we’ve targetted down to that part of the JSON. The ? operator will tell us if some part of JSON has a top level key:

SELECT *
FROM companies
WHERE data->'company'->'tags' ? 'B2B'

JSONB but pretty

In querying JSONB you’ll typically get a nice compressed set of JSON back. While this is all fine if you’re putting it into your application, if you’re manually debugging and testing things you probably want something a bit more readable. Of course Postgres has your back here and you can wrap your JSONB with a pretty print function:

SELECT jsonb_pretty(data)
FROM companies;

Much more

There’s a lot more in the docs that you can find handy for the specialized cases when you need them. jsonb_each will expand a JSONB document into individual rows. So if you wanted to count the number of occurences of every tag for a company, this would help. Want to parse out a JSONB to a row/record in Postgres there’s jsonb_to_record. The docs are your friends for about everything you want to do but hopefully these few steps help kick start things if you want to get started with JSONB.

Simple but handy Postgres features

Sun, 08 Jan 2017 12:55:56 -0800

It seems each week when I’m reviewing data with someone a feature comes up that they had no idea existed within Postgres. In an effort to continue documenting many of the features and functionality that are useful, here’s a list of just a few that you may find handy the next time your working with your data.

Psql, and \e

This one I’ve covered before, but it’s worth restating. Psql is a great editor that already comes with Postgres. If you’re comfortable on the CLI you should consider giving it a try. You can even setup you’re own .psqlrc for it so that it’s well customized to your liking. In particular turning \timing on is especially useful. But even with all sorts of customization if you’re not aware that you can use your preferred editor by using \e then you’re missing out. This will allow you to open up the last run query, edit it, save–and then it’ll run for you. Vim, Emacs, even Sublime text works just take your pick by setting your $EDITOR variable.

Watch

Ever sit at a terminal running a query over and over to see if something on your system changed? If you’re debugging something whether locally or even live in production, watching data change can be key to figuring out. Instead of re-running your query you could simply use the \watch command in Postgres, this will re-run your query automatically every few seconds.

SELECT now() -
 query_start,
 state,
 query
FROM pg_stat_activity
\watch

JSONB pretty print

I love JSONB as a datatype. Yes, in cases it won’t be the optimal for performance (though at times it can be perfectly fine). If I’m hitting some API that returns a ton of data, I’m usually not using all of it right away. But, you never know when you’ll want to use the rest of it. I use Clearbit this way today, and for safety sake I save all the JSON result instead of de-normalizing it. Unfortunately, when you query this in Postgres you get one giant compressed text of JSON. Yes, you could pipe out to something like jq, or you could simply use Postgres built in function to make it legible:

SELECT jsonb_pretty(clearbit_response)
FROM lookup_data;

 jsonb_pretty
-------------------------------------------------------------------------------
 {
 "person": {
 "id": "063f6192-935b-4f31-af6b-b24f63287a60",
 "bio": null,
 "geo": {
 "lat": 37.7749295,
 "lng": -122.4194155,
 "city": "San Francisco",
 "state": "California",
 "country": "United States",
 "stateCode": "CA",
 "countryCode": "US"
 },
 "name": {
 ...

Importing my data into Google

This one isn’t Postgres specific, but I use it on a weekly basis and it’s key for us at Citus. If you use something like Heroku Postgres, dataclips is an extremely handy feature that lets you have a real-time view of a query and the results of it, including an anonymous URL you can it for it. At Citus much like we did at Heroku Postgres we have a dashboard in google sheets which pulls in this data in real-time. To do this simple select a cell then put in: =importdata("pathtoyourdataclip.csv"). Google will import any data using this as long as it’s in CSV form. It’s a great lightweight way to build out a dashboard for your business without rolling your own complicated dashboarding or building out a complex ETL pipeline.

I’m sure I’m missing a ton of the smaller features that you use on a daily basis. Let me know @craigkerstiens the ones I forgot that you feel should be listed.

Syncing from Postgres to Salesforce - Data Mappings

Wed, 23 Nov 2016 12:55:56 -0800

For the second time now I’ve had to implement a system that syncs from my system of record into Salesforce.com, the first at Heroku and now at Citus Data. The case here is pretty simple, I have a software-as-a-service, B2B product. It’s a homegrown application in these cases in Ruby, but could be Python, .Net, any language of your choosing. The problem is I don’t want to have to be rebuilding my own CRM, reporting, etc. on top of all of my internal database. And as soon as you’re at some reasonable size (sales guy of 1 or more) you need to be able to provide insights on what’s in that system of record database to others.

While my tooling isn’t a full fledged product by any means, here’s a bit of how I’ve developed this process a few times over and some of the annoying bits of code to help get you started. In this post I’ll walk through some of the basic datatypes, then we’ll follow-up with the overall architecture and tweaks you need to make to Salesforce, and finally we’ll provide some example code to help you create this setup yourself.

Leads, Contacts, Accounts oh my

Despite being some of the largest as-a-service vendors in the world, Salesforce is still primarily setup for traditional high touch sales. What this means is some of the data you’ll commonly have, or in this case not have, can make it difficult to figure out what maps from your internal system to Salesforce. Within Salesforce there’s really 4 key data models you’re going to care about.

Lead vs. Contact

In every as a service product you’ll have some user that creates and account which usually has an email address tied to it. This seems simple enough to load up to Salesforce as there is a clear email field. Within Salesforce there are two key data types which have a default field for this lead and contact, in Salesforce terms a lead is someone considering doing business with you, a contact someone who more so is doing business with you. If you have a freemium or timed trial model you might think to start classifying everyone that they’re a lead. Then, when they convert to a paying customer you turn them into a contact.

If you’re anything like me, in running your SaaS business, you want a sign-up process that’s frictionless. This means give me an email address, password, and you’re off and running. Salesforce immediately starts to breakdown a bit in this regard. First you’re required for both lead and contact to provide a first and last name. In my case I do ask for name, and do a little bit of work on the code side to get values into both. You’ll see later that our process does result in some regular cleanup work needing to happen, but in our case we’re optimizing to get them signed up more than capturing every detail perfectly about them from the start.

Leads are even more broken than contacts though. Leads require you to enter a company. While you may be able to just drop a company form field onto your sign-up page you’re likely to end up with junk data at least, if not actually driving some sign-ups away. Some of my favorite pieces of junk data I’ve seen users enter for company name: “pissed off developer”, “Acme Inc.”, and the all too common “Test Co.”. In reality these are often real developers, with real problems, and real budget, they just don’t want to share details before they’re ready.

So in this case the TLDR; is that leads require:

First name
Last name
Email
Company name

This results in contacts being a more favorable datatype because it only requires:

First name
Last name
Email

Accounts vs. Opportunities

We have in some ways a similar but different dichotomy with Accounts and Opportunities as we did Leads and Contacts. Though this one can often map a bit more cleanly than we saw with leads. From a pretty straight forward definition:

Account - A business entity. Contacts work for Accounts.
Opportunities - Sales events related to an Account and one or more Contacts.

This again can become problematic if you have no notion of Accounts at all in your system of record. Though if you are building a B2B application there is a good chance you may have something that makes sense. If you let uses free-form enter this instead of AT&T they may put “interactive team”, but you at least have some logical team that in their mind they roll up to.

Opportunities is a much harder one in the SaaS world. In traditional marketing you have your standard stages of MQL (Marketing Qualified Lead), progressing to SQL (Sales Qualified Lead), etc. that you expect these potential customers to flow through. In the as-a-service world you may have people look from afar for weeks, then suddenly sign-up and give you a credit card and start paying within minutes. While there is still steps the customer may go through before buying you often have less insight into these. How you decide to structure your opportunities flow is entirely up to you. In my case I tend to opt to still have htem, but they’re an exception basis where a salesperson is actively engaged vs. the other 90%+ of fully self-service customers.

Shifting back a little bit on accounts. The key with accounts is that if you have some notion of an team or org within your system of record then it makes sense to have that same structure setup in Salesforce. The most basic of this might be an idea of “Account owner” and “Team members”. You may have a person in there just for billing, an admin, and then users. Even if you don’t want to recreate the entire structure at least having all the contacts tied to the account is critical. I can’t count the number of times I’ve seen teams setup a “billing@mycompany.com” email, seen people try to interact with that email when in reality they wanted to be talking to “jane@mycompany.com” who logged in yesterday.

In summary

For the most part Salesforce doesn’t quite let you map to what many of you’ll want to do in terms of mapping your data from your system of record to Salesforce. Expect to have to contort a bit and likely pump Salesforce with some garbage data. In general you’ll want to skip leads and go straight for contacts as contacts don’t require the same restrictions. Tying contacts to an account is the right level anyway, and from there up to you on how you’ll more manage the opportunities.

Open DNS for when DNS outages occur

Fri, 21 Oct 2016 12:55:56 -0800

Open DNS is a DNS resolver that caches records beyond their TTL if the upstream DNS server cannot be found. In cases like today’s major outage it can be handy to swap your DNS settings out for this, or it may be worth using as a standard default. Resolution may be a bit slow as it will try to see if the upstream server cannot be found, but it at least can get you back to a working state.

If you know what you’re doing then all you need to do is configure your DNS settings to: 208.67.220.220 and 208.67.222.222.

If you need a little more guidance you can go into your System Preferences on Mac, select Network, then Advanced and finally the DNS tab. You should set it up to look as follows:

A tour of Postgres' Foreign Data Wrappers

Sun, 11 Sep 2016 12:55:56 -0800

SQL can be a powerful language for reporting. Whether you’re just exploring some data, or generating reports that show month over month revenue growth it’s the lingua franca for data analysis. But, your data isn’t always in a SQL database, even then if you’re using Postgres you can still likely use SQL to analyze, query, even joing with that data. Foreign data wrappers have been around for years in Postgres, but are continuing to mature and be a great option for joining disparate systems.

Overview of foreign data wrappers

If you’re unfamiliar, foreign data wrappers, or FDW, allow you to connect from within Postgres to a remote system. Then you can query them from directly within Postgres. While there is an official Postgres FDW that ships with Postgres itself, that allows you to connect from one Postgres DB to another, there’s also a broad community of others.

At the core of it Postgres provides certain APIs under the covers which each FDW extension can implement. This can include the ability to map SQL to whatever makes sense for a given system, push down various operators like where clauses, and as of Postgres 9.3 can even write data.

To setup a FDW you first would install the extension, then provide the connection to the remote system, setup your schema/tables, and then you’re off to the races–or well ready to query. If you’ve got more than 2-3 databases or systems in your infrastructure, you’ll often benefit from FDWs as opposed to introducing a heavyweight ETL pipeline. Don’t mistake FDWs as the most performant method for joining data, but they are often the developer time efficient means of joining these data sets.

Let’s look at just a few of the more popular and interesting ones.

Postgres FDW

The Postgres one is the easiest to get started with. First you’ll just enable it with CREATE EXTENSION, then you’ll setup your remote server:

CREATE EXTENSION postgres_fdw;
CREATE SERVER core_db
FOREIGN DATA WRAPPER postgres_fdw
OPTIONS (host 'foo', dbname 'core_db', port '5432');

Then you’ll create the user that has access to that database:

CREATE USER MAPPING FOR bi SERVER core OPTIONS (user 'bi', password 'secret');

Finally, create your foreign table:

CREATE FOREIGN TABLE core_users (
id integer NOT NULL,
username varchar(255),
password varchar(255),
last_login timestamptz
)
SERVER core;

Now you’ll see a new table in the database you created this in called core_users. You can query this table just like you’d expect:

SELECT *
FROM core_users
WHERE last_login >= now() - '1 day'::interval;

You can also join against local tables, such as getting all the invoices for users that have logged in within the last month:

SELECT *
FROM invoices, core_users
WHERE core_users.last_login >= now() - '1 month::interval'
AND invoices.user_id = core_users.id

Hopefully this is all straight forward enough, but let’s also take a quick look at some of the other interesting ones:

MySQL FDW

For MySQL you’ll also have to download it and install it as well since it doesn’t ship directly with Postgres. This should be fairly straight forward:

$ export PATH=/usr/local/pgsql/bin/:$PATH
$ export PATH=/usr/local/mysql/bin/:$PATH
$ make USE_PGXS=1
$ make USE_PGXS=1 install

Now that you’ve built it you’d follow a very similar path to setting it up as we did for Postgres:

CREATE EXTENSION mysql_fdw;
CREATE SERVER mysql_server
FOREIGN DATA WRAPPER mysql_fdw
OPTIONS (host '127.0.0.1', port '3306');
CREATE USER MAPPING FOR postgres
SERVER mysql_server
OPTIONS (username 'foo', password 'bar');
CREATE FOREIGN TABLE core_users (
id integer NOT NULL,
username varchar(255),
password varchar(255),
last_login timestamptz
)
SERVER mysql_server;

But MySQL while different than Postgres is also more similar in SQL support than say a more exotic NoSQL store. How well do they work as a foreign data wrapper? Let’s look at our next one:

MongoDB

First you’ll go through much of the same setup as you did for MySQL. The one major difference though is in the final step to setup the table. Since a table doesn’t quite map in the same way with Mongo you have the ability to set two items: 1. the database and 2. the collection name.

CREATE FOREIGN TABLE core_users(
_id NAME,
user_id int,
user_username text,
user_last_login timestamptz)
SERVER mongo_server
OPTIONS (database 'db', collection 'users');

With this you can do some basic level of filtering as well:

SELECT *
FROM core_users
WHERE user_last_login >= now() - '1 day'::interval;

You can also write and delete data as well now just using SQL:

DELETE FROM core_users
WHERE user_id = 100;

Of course just putting SQL on top of Mongo doesn’t mean you get all the flexibility of analysis that you’d have directly within Postgres, this does go a long way towards allowing you to analyze data that lives across two different systems.

Many more

A few years ago there were some key ones which already made FDWs useful. Now there’s a rich list covering probably every system you could want. Whether it’s Redis, a simple CSV one, or something newer like MonetDB chances are you can find an FDW for the system you need that makes your life easier.

Five mistakes beginners make when working with databases

Tue, 07 Jun 2016 12:55:56 -0800

When you start out as a developer there’s an overwhelming amount of things to grasp. First there’s the language itself, then all the quirks of the specific framework you’re using,and after that (or maybe before) we’ll throw front-end development into the mix, and somewhere along the line you have to decide to store your data somewhere.

Early on, with so many things to quickly master, the database tends to be an after-though in application design (perhaps because it doesn’t make an impact to end user experience). As a result there’s a number of bad practices that tend to get picked up when working with databases, here’s a rundown of just a few.

1. Storing images

Images don’t belong in your database. Just because you can do something, it doesn’t mean you should.Images take up a massive amount of space in databases, and slow applications down by unnecessarily eating your database’s IO resources. The most common way this mistake occurs is when new developers base64 encode an image and store it in a database large text/blob field.

The better approach is to upload your images directly to a service like Amazon S3, then store the image URL (hosted by Amazon) in your database as a text field. This way, each time you need to load an image, you need to simply output the image URL into a valid <img> tag. This will greatly improve website responsiveness, and generally help scale web applications.

2. Limit/Offset

Pagination is extremely common in a number of applications.As soon as you start to learn SQL, the most straight-forward way to handle pagination is to ORDER BY some column then LIMIT the number of results returned, and for each extra page you’ll OFFSET by so many records. This all seems entirely logical, until you realize at any moderate scale:

The load this exerts on your database will be painful.
It isn’t deterministic, should records change as the user flips between pages.

The unfortunate part is: pagination is quite complex, and there isn’t a one-size-fits-all solution. For more information on solving pagination problems, you can check out numerous options

3. Integer primary keys

The default for almost all ORMs when creating a primary key is to create a serial field. This is a sequence that auto-increments and then you use that number as your primary key. This seems straight forward as an admin, because you can browse from /users/1 to /users/2, etc. And for most applications this can often be fine. And for most applications, this is fine. But, you’ll soon realize as you start to scale that integers primary keys can be exhausted, and are not ideal for large-scale systems. Further you’re reliant on that single system generating your keys. If a time comes when you have to scale the pain here will be huge. The better approach is to start taking advantage of UUIDs from the start.

There’s also the bonus advantage of not secretly showcasing how many users/listings/whatever the key references directly to users on accident.

4. Default values on new columns

No matter how long you’ve been at it you won’t get the perfect schema on day 1. It’s better to think of database schemas as continuously evolving documents. Fortunately, it’s easy to add a column to your database, but: it’s also easy to do this in a horrific way. By default, if you just add a column it’ll generally allow NULL values. This operation is fast, but most applications don’t truly want null values in their data, instead they want to set the default value.

If you do add a column with a default value on the table, this will trigger a full re-write of your table. Note: this is very bad for any sizable table on an application. Instead, it’s far better to allow null values at first so the operation is instant, then set your default, and then, with a background process go and retroactively update the data.

This is more complicated than it should be, but fortunately there are some handy guides to help.

5. Over normalization

As you start to learn about normalization it feels like the right thing to do. You create a posts table, which contains authors, and each post belongs in a category. So you create a categories table, and then you create a join table post_categories. At the real root of it there’s not anything fundamentally wrong with normalizing your data, but at a certain point there are diminishing returns.

In the above case categories could very easily just be an array of varchar fields on a post. Normalization makes plenty of sense, but thinking through it a bit more every time you have a many to many table and wondering if you really need a full table on both sides is worth giving a second thought.

Edit: It’s probably worth saying that under-normalization is also a problem as well. There isn’t a one size fits all here. In general there are times where it does make sense to have a completely de-normalized and a completely normalized approach. As @fuzzychef described: “use an appropriate amount of normalization i.e. The goldilocks principle”

Conclusion

When I asked about this on twitter I got a pretty great responses, but they were all over the map. From the basics of never looking at the queries the ORM is generating, to much more advanced topics such as isolation levels. The one I didn’t hit on that does seem to be a worthwhile one for anyone building a real world app is indexing. Knowing how indexing works, and understanding what indexes you need to create is a critical part of getting good database performance. There’s a number of posts on indexing that teach the basics, as well as practical steps for analyzing performance with Postgres.

In general, I encourage you to treat the database as another tool in your chest as opposed to a necessary evil, but hopefully, the above tips will at least prevent you from making some initial mistakes as you dig in as a beginner.

Special thanks to @mdeggies and @rdegges for the initial conversation to spark the post at PyCon.

Sun, 28 Feb 2016 12:55:56 -0800

Notice: Much of this post still applies, but now applies more directly to Citus. Since this post originally published, pg_shard is now deprecated. You can find some further guidance for sharding on the Citus blog and docs

Back in 2012 I wrote an overview of database sharding. Since then I’ve had a few questions about it, which have really increased in frequency over the last two months. As a result I thought I’d do a deeper dive with some actual hands on for sharding. Though for this hands on, because I do value my time I’m going to take advantage of pg_shard rather than creating mechanisms from scratch.

For those unfamiliar pg_shard is an open source extension from Citus data who has a commerical product that you can think of is pg_shard++ (and probably much more). Pg_shard adds a little extra to let data automatically distribute to other Postgres tables (logical shards) and Postgres databases/instances (physical shards) thus letting you outgrow a single Postgres node pretty simply.

Alright, enough talk about it, let’s get things up and running.

Build, install

The rest assume you have Postgres.app, version 9.5 setup and are on a Mac, much of these steps could be easily adapted for other Postgres installs or OSes.

PATH=/Applications/Postgres.app/Contents/Versions/latest/bin/:$PATH make

sudo PATH=/Applications/Postgres.app/Contents/Versions/latest/bin/:$PATH make install

cp /Applications/Postgres.app/Contents/Versions/9.5/share/postgresql/postgresql.conf.sample /Applications/Postgres.app/Contents/Versions/9.5/share/postgresql/postgresql.conf.sample

Edit your postgresql.conf:

#shared_preload_libraries = ''

TO:

shared_preload_libraries = 'pg_shard'

Then create a file in /Users/craig/Library/Application\ Support/Postgres/var-9.5/pg_worker_list.conf where craig is your username:

# hostname port-number
localhost 5432
localhost 5433

You’ll also need to create a new Postgres instance:

initdb -D /Users/craig/Library/Application\ Support/Postgres/var-9.5-2

Then edit that postgresql.conf inside that newly created folder with two main edits:

port = 5432

port = 5433

Finally setup our database then start it up:

createdb instagram
postgres -D /Users/craig/Library/Application\ Support/Postgres/var-9.5-2

Setup

Now you should have two running instances of Postgres, now let’s finally turn on the pg_shard extension, create some tables and see what we have. First connect to your main running Postgres instance, so in this case the the instagram database we first created psql instagram, then let’s set things up:

CREATE EXTENSION pg_shard;
CREATE TABLE customer_reviews (customer_id TEXT NOT NULL, review_date DATE, review_rating INTEGER, product_id CHAR(10));
CREATE TABLE
Time: 4.734 ms
SELECT master_create_distributed_table(table_name := 'customer_reviews', partition_column := 'customer_id');
master_create_distributed_table
---------------------------------
(1 row)
SELECT master_create_worker_shards(table_name := 'customer_reviews', shard_count := 16, replication_factor := 2);
master_create_worker_shards
-----------------------------
(1 row)

Understanding and using

So that was a lot of initial setup. But now we have an application that could in theory scale to a shared application across 16 instances. If you want a refresher, there’s a difference between physical and logical shards. In this case above we have 16 logical ones and it’s replicated across 2 physical Postgres instances albeit on the same instance.

Alright so a little more poking under the covers to see what happened before we actually start doing something with our data. If you’re still connected go ahead and run \d, and you should see:

 List of relations
Schema | Name | Type | Owner
--------+------------------------+-------+-------
public | customer_reviews | table | craig
public | customer_reviews_10000 | table | craig
public | customer_reviews_10001 | table | craig
public | customer_reviews_10002 | table | craig
public | customer_reviews_10003 | table | craig
public | customer_reviews_10004 | table | craig
public | customer_reviews_10005 | table | craig
public | customer_reviews_10006 | table | craig
public | customer_reviews_10007 | table | craig
public | customer_reviews_10008 | table | craig
public | customer_reviews_10009 | table | craig
public | customer_reviews_10010 | table | craig
public | customer_reviews_10011 | table | craig
public | customer_reviews_10012 | table | craig
public | customer_reviews_10013 | table | craig
public | customer_reviews_10014 | table | craig
public | customer_reviews_10015 | table | craig
(17 rows)

You can see that under the cover there’s a lot more customer_reviews tables, in reality you don’t have to think about these or do anything with them. But just for reference they’re just plain ole Postgres tables under the cover. You can query them and poke at the data. The now mystical customer_reviews will actually roll up the data across all your logical shards (tables) and physical shards (spanning across machines).

It’s also of note that in production you might not actually use your primary DB as a worker, we did this more for expediency in setting it up on a local Mac. More typically you’d have 2 or more workers which are not the same a the primary, these were the ports we setup in our pg_worker_list.conf. A common setup would look something more like:

So now start inserting away:

INSERT INTO customer_reviews (customer_id, review_rating) VALUES ('HN802', 5);
INSERT INTO customer_reviews (customer_id, review_rating) VALUES ('FA2K1', 10);

For extra homework on your own you can now go and poke at where the underlying data actually surfaced.

Conclusion

Yes, there’s a number of limitations that you can learn a bit more about over on the github repo for pg_shard. Though even with those it’s very usable as is, and let’s you get quite far in prepping an app for sharding. While I will say that all apps think they’ll need sharding and few actually do, given pg_shard it’s minimal extra effort now to plan for such scaling should you need it.

Up next we’ll look at how it’d work with a few languages, so you can get an idea of the end to end experience.

What being a PM is really like - Software is easy, People are hard

Thu, 28 Jan 2016 12:55:56 -0800

In recent months I’ve had the question nearly once a week about advice/tips for becoming a Product Manager or more commonly referred to as PM. These are generally coming from people that are either currently engineers, or previously were and are in some engineer/customer role such as a sales engineer or solution architect. There’s a number of high level pieces talking about PM and it often feels glorious, I mean you get to make product decisions right? You get to call some shots. Well that sometimes may be true, but don’t assume it’s all rainbows and sparkles.

Especially as a first time PM what your day to day will look like won’t be debating strategy all day long. Here’s a few of the good and the bad sides of being a PM.

Plenty of grunt work

While you may get to make a decision or two, the bulk of your time will not be thinking about grandiose visions, instead you’ll be doing a lot to gather data. There’s a lot of means for gathering data across lots of sources, the more you use the better you’ll be. Knowing the ones you steer towards, as well as ones you steer away from is useful so you can balance a bias more fairly. For myself SQL is a go-to, then customer interactions both qualitative and quantitative such as surveys, following what media is saying about your space is important as well. And while user studies are often relegated to design and UX, as a PM you need to make sure it at least happens (Invision App is a favorite for lightweight tests).

In a given week I probably spend 10 hrs interacting with customers, looking at data, and sadly that’s probably not enough.

A few practical examples of this

Each morning I send emails to 10-20 users who used the product for the first time, yes this is automated but carving out 30 minutes of my day to actually follow-up with each of them is less automated.

Another example is keeping a health of business dashboard up to date. Personally I use google sheets for this. Within one spreadsheet I have monthly and weekly targets as well as how we’re tracking against them. These are all updated on actual real time data, powered by Heroku’s dataclips with a simple =importCSV(‘http://dataclips.heroku.com/abcdefghij….csv’). In total my google spreadsheets has 1 high level overview, with about 20 underlying sheets that do all of the computations. In any given month 1 of my key 4-5 goals may be missed, which then spawns digging in deeper to figure out why and what we can do about it.

Dictating vs. consensus

From a product decision making perspective you can force alignment by explicitly making every decision, or you can allow decisions to be made as a group voting if needed. Expect to use some balance of both of these among the team, and neither is never perfect. When it comes to outside the team you may still use both, but steer more strongly one way or the other. For example with the executive team it may be more consensus, with marketing it may be dictating your product roadmap which they can help support.

Even within the team there will be times a decision must be made and there will be some people that don’t align. It’s key that you make the decision clearly and explicitly. Even though some individuals don’t like it, they won’t fight against it… unless you make a habit of taking the input, then discarding it and going along your ‘intuition’. Even when there’s a strong case based on the data it may not be as clear as you think.

In contrast, decision making by consensus most people will feel happier that they provided input into the product. If you take everyone’s input expect to end up with a product that feels like 10 people designed it, needless to say incoherent.

As a PM expect to do a lot of listening, a good bit of convincing, and some occasional big decision making.

The pain you feel inside the building doesn’t matter

You may think you’re solving a problem that exists for users, when in reality there was no problem at all. This is just a reminder that you need to keep empathy in mind above so much else.

As an example, once a data team put in place a tool, supposedly for me. I looked at the tool and more or less didn’t understand why it was in place. They proceeded to explain that my problem was it took me too long to write SQL, so this tool will help me get the reports I need without SQL. At that point I proceeded to actually list off all the issues I did have, none of which were SQL.

Prescribing a solution without knowing clearly from customers what the problem is will leave you in a bad spot. All this means you have to set aside building the tool you want to use, and make sure you know what the customer wants.

Marketing is your job

Not external marketing, though often that work may still fall to you, but rather internal marketing.

It’s important that you internally market the wins for your team. These wins should very much be for the team, and not for your own benefit. The best PMs seem to disappear into the background, this is because you’re more surfacing all the work your team is doing than any of the details you helped coordinate. This is often counter to our natural instinct to tout our own accomplishments. This is only exaggerated in PM role, one where things can still ship if you’re not there, so there can often be a tendency to try to highlight the roles value. Fight that urge to self market.

Rest assured though–getting the team focused on solving the right problems, and then surfacing their wins will only help you go faster.

On leading

At the end of the day ensuring the product is advancing is your job, so be prepared to do what you need to whether it’s leading or not to accomplish that.

It’s not all rainbows, but it is fun

The range of things you’ll have to focus on can be diverse and complex. In the end if you get a rush out of shipping and launching products, then all the work that goes into it can make it all worthwhile. It’s as much about figuring out what customers want and then getting your team building the right thing. For a first time PM it can be summed up by the notion that software is easy, people are hard.

Special thanks to Lukas Fittl and Arun Thampi for reviews and feedback on this post.

Marketing definitions for developers

Sun, 17 Jan 2016 12:55:56 -0800

Marketing often feels like a dirty-icky thing to many developers. Well until you feel like you have a great product, but no one using it then you have to get a crash course in all of that. And while I might cover some of the actual basics in the future, just knowing what marketing people actually mean when they’re talking can be a huge jump start. Here’s a guide that distills many of the acronyms and terms down to what they actually mean in reality.

SEO - Search engine optimization. There’s two sides to this, one where you’re attempting to game the system known known as black hat. The other is simply creating good content.

Tip: Now, unlike several years ago social sharing helps impact this.

SEM - Search engine marketing. The short of this is adwords, but broadly it’s any search engine.

Tip: Be wary here, you can spend a lot of money quickly. Properly managing it takes time and effort otherwise you’re wasting money.

Display ads - Banner ads on websites. There’s a few common form factors in this world, so you’ll create a few then reuse them across lots of properties.

Tip: Results may vary here, there are some hidden gems when advertising on various long tail sites.

Retargeting - This is where you’re serving an ad (most commonly display) to someone that’s previously visited your site. The process happens due to you ‘pixeling’ them, and a cookie being set so the ad server knows they’ve seen you.

Tip: Generally good bang for the buck here, but you still need initial visitors to even retarget to.

Funnel - The process of someone going from finding you to paying to paying more. Generally a process will look something like: Anonymous visitor by referral, sign ups, low money bucket, big money bucket.

Top of the funnel - Hopefully clear from the previous one, top of the funnel would be the max of users you reach out to or get to your site, usually down to getting them to sign up.

Bottom of the funnel - This is usually going from time you have a user to customer and then growing that customer via cross-selling and upselling.

Drip marketing - This is the process of gradually sending emails/notifications to your customers to get them to engage and learn about the product. Think of it as a welcome email on day 1, an intro on day 3, and on day 5 a different email based on what they’ve done so far. Really good drip marketing will create a different email for the user based on what they have or haven’t done.

Attribution (last/first/multi) - Attribution relates to how you got the user or customer (via web referral). There’s a few different ways of looking at this, last touch is the last website they visited before signing up, first touch is the first referral they were sent from, and multi touch (often more complicated to put in place) attributes something to all referrals they’ve come from.

AR - Analyst relations. Analysts cover particular products or areas in an industry, write reports, and often consult with large enterprises when making buying decisions. Analyst relations or AR is the common term for interacting with them, you can learn more here

PR - Public relations. This is generally the press/media side. It often involves launches, press releases, pitching media etc. You can read more of a guide on it here

Briefing - This is normally with an analyst or press, and is typically a quick 30 minute call occasionally a demo of an upcoming launch.Usually

Inquiry - This refers more to the analyst side. Where a briefing is often more one sided for you to pitch/update them on what you’ve been doing, an inquiry is more a back and forth where you can ask what they’re seeing in the market and for input on direction.

Campaign - A collection of activities that go on around a certain thing focused on specific keywords or theme. This can be as little as a search engine marketing campaign which is the most common, or much larger and coordinated with billboards, webinars, etc.

Lead Gen - The sales funnel usually goes from just getting an email, to talking to them, to getting them to try a demo or run a POC, to eventually buying. Lead gen is the activity of just getting that initial contact so you can then further engage with them. In practice this can often involve giving something away, like a t-shirt in exchange for an email.

Conclusion

While this doesn’t cover every marketing activity under the sun, hopefully it’s a good primer on things you may have heard but been confused by. If there’s important ones I’ve missed please feel free to let me know @craigkerstiens

Writing more legible SQL

Fri, 08 Jan 2016 12:55:56 -0800

A number of times in a crowd I’ve asked how many people enjoy writing SQL, and often there’s a person or two. The follow up is how many people enjoy reading other people’s SQL and that’s unanimously 0. The reason for this is that so many people write bad SQL. It’s not that it doesn’t do the job, it’s just that people don’t tend to treat SQL the same as other languages and don’t follow strong code formatting guidelines. So, of course here’s some of my own recommendations on how to make SQL more readable.

One thing per line

Only put a single column/table/join per line. This is going to make for slightly more verbose SQL, but it will be easier to read and edit.. Here’s a basic example:

SELECT foo,
bar
FROM baz

Align your projections and conditions

You can somewhat see this in the above with foo and bar being on the same line. This is reasonably common for columns you’re selecting, but it’s not applied as often in AND or GROUP BY clauses. As you can see there is a difference though between:

SELECT foo,
bar
FROM baz
WHERE foo > 3
AND bar = 'craig.kerstiens@gmail.com'

And a cleaner version:

SELECT foo,
bar
FROM baz
WHERE foo > 3
AND bar = 'craig.kerstiens@gmail.com'

Use column names when grouping/ordering

This is personally an awful habit of mine, but it is extremely convenient to just order by the column number. In the above query we could just ORDER BY 1. This is especially easy when column 1 may be something like SUM(foo). However, ensuring you explicitly ORDER BY SUM(foo) will help limit any misunderstanding of the data.

Comments

You comment your code all the time, yet so few seem to comment their queries. A simple -- allows you to inline a comment, perhaps where there’s some oddities to what you’re joining or just anywhere it may need clarification. You can of course go much further, but at least some basic level of commenting should be required.

Casing

As highlighted in these examples, having a standard for how you case your queries is especially handy. Sticking with all SQL keywords in caps allows you to easily parse what is SQL and what are columns or literals that you’re using in queries.

CTEs

First, yes they can be an optimisation boundary. But they can also make your query much more read-able and prevent you from doing the wrong thing because you couldn’t reason about a query.

For those unfamiliar CTEs are like a view that exist just for the duration of that query being executed. You can have them reference previous CTEs so you can gradually build on them, much like you would code blocks. I won’t repeat too much of what I’ve already written about them, but if you’re unfamiliar with them or not using them they are a must. CTEs are easily one of the few pieces of SQL that I use on a daily basis.

Conclusion

Of course this isn’t the only way to make your SQL more readable and this isn’t an exhaustive list. But hopefully you find these tips helpful, and for your favorite tip that I missed… let me know about it @craigkerstiens.

A special thanks to @Case for reviewing.

My top 10 Postgres features and tips for 2016

Tue, 29 Dec 2015 12:55:56 -0800

I find during the holiday season many pick up new books, learn a new language, or brush up on some other skill in general. Here’s my contribution to hopefully giving you a few new things to learn about Postgres and ideally utilize in the new year. It’s not in a top 10 list as much as 10 tips and tricks you should be aware of as when you need them they become incredibly handy. But, first a shameless plug if you find any of the following helpful, consider subscribing to Postgres weekly a weekly newsletter with interesting Postgres content.

1. CTEs - Common Table Expressions

CTEs allow you to do crazy awesome things like recursive queries but even the most simple form of them I don’t go a day without using. Think of a CTE or commonly known as with clause as a view inside the time that query is running. This lets you more easily create readable query. Any query that’s constructed that’s even 100 lines long, but with 4-5 CTEs is undoubtedly going to be easier for someone new to come in and understand than a 20 line query that does the same thing. A few people like writing SQL, but no one likes reading someone else’s so do them a favor and read up on CTEs.

2. Setup a .psqlrc

You setup a bashrc, vimrc, etc. Why not do the same for Postgres. Some of the great things you can do:

Setup pretty formatting by default with \x auto
Set nulls to actually look like something \pset null ¤
Turn timing on by default \timing on
Customize your prompt \set PROMPT1 '%[%033[33;1m%]%x%[%033[0m%]%[%033[1m%]%/%[%033[0m%]%R%# '
Save commonly run queries that you can run by name

Here’s an example of my own psqlrc:

\set QUIET 1
\pset null '¤'
-- Customize prompts
\set PROMPT1 '%[%033[1m%][%/] # '
\set PROMPT2 '... # '
-- Show how long each query takes to execute
\timing
-- Use best available output format
\x auto
\set VERBOSITY verbose
\set HISTFILE ~/.psql_history- :DBNAME
\set HISTCONTROL ignoredups
\set COMP_KEYWORD_CASE upper
\unset QUIET

3. pg_stat_statements for where to index

pg_stat_statements is probably the single most valuable tool for improving performance on your database. Once enabled (with create extension pg_stat_statements) it automatically records all queries run against your database and records often and how long they took. This allows you to then go and find areas you can optimize to get overall time back with one simple query:

SELECT
(total_time / 1000 / 60) as total_minutes,
(total_time/calls) as average_time,
query
FROM pg_stat_statements
ORDER BY 1 DESC
LIMIT 100;

Yes, there is some performance cost to leaving this always on, but it’s pretty small. I’ve found it’s far more useful to be on and get major performance wins vs. the small cost of it always recording.

You can read much more on Postgres performance on a previous post

4. Slow down with ETL, use FDWs

If you have a lot of microservices or different apps then you likely have a lot of different databases backing them. The default for about anything you want to do is do create some data warehouse and ETL it all together. This often goes a bit too far to the extreme of aggregating everything together.

For the times you just need to pull something together once or on rare occasion foreign data wrappers will let you query from one Postgres database to another, or potentially from Postgres to anything else such as Mongo or Redis.

5. array and array_agg

There’s little chance if you’re building an app you’re not using arrays somewhere within it. There’s no reason you shouldn’t be doing the same within your database as well. Arrays can be just another datatype within Postgres and have some great use cases like tags for blog posts directly in a single column.

But, even if you’re not using arrays as a datatype there’s often a time when you want to rollup something like an array in a query then comma separate it. Something similar to the following could allow you to easily roll up a comma separated list of projects per user:

SELECT
users.email,
array_to_string(array_agg(projects.name), ',')) as projects
FROM
projects,
tasks,
users
WHERE projects.id = tasks.project_id
AND tasks.due_at > tasks.completed_at
AND tasks.due_at > now()
AND users.id = projects.user_id
GROUP BY
users.email

6. Use materialized views cautiously

If you’re not familiar with materialized view they’re a query that has been actually created as a table. So it’s a materialized or basically snapshotted version of some query or “view”. In their initial version materialized versions, which were long requested in Postgres, were entirely unusuable because when you it was a locking transaction which could hold up other reads and acticities avainst that view.

They’ve since gotten much better, but there’s no tooling for refreshing them out of the box. This means you have to setup some scheduler job or cron job to regularly refresh your materialized views. If you’re building some reporting or BI app you may undoubtedly need them, but their usability could still be advanced so that Postgres knew how to more automatically refresh them.

If you’re on Postgres 9.3, the above caveats about preventing reads still does exist

7. Window functions

Window functions are perhaps still one of the more complex things of SQL to understand. In short they let you order the results of a query, then compute something from one row to the next, something generally hard to do without procedural SQL. You can do some very basic things with them such as rank where each result appears ordered by some value, or something more complex like compute MoM growth directly in SQL.

8. A simpler method for pivot tables

Table_func is often referenced as the way to compute a pivot table in Postgres. Sadly though it’s pretty difficult to use, and the more basic method would be to just do it with raw SQL. This will get much better with Postgres 9.5, but until then something where you sum up each condition where it’s true or false and then totals is much simpler to reason about:

select date,
sum(case when type = 'OSX' then val end) as osx,
sum(case when type = 'Windows' then val end) as windows,
sum(case when type = 'Linux' then val end) as linux
from daily_visits_per_os
group by date
order by date
limit 4;

Example query courtesy of Dimitri Fontaine and his blog.

9. PostGIS

Sadly on this one I’m far from an expert. PostGIS is arguably the best option of any GIS database options. The fact that you get all of the standard Postgres benefits with it makes it even more powerful–a great example of this is GiST indexes which came to Postgres in recent years and offers great performance gains for PostGIS.

If you’re doing something with geospatial data and need something more than the easy to use earth_distance extension then crack open PostGIS.

10. JSONB

I almost debated leaving this one off the list, ever since Postgres 9.2 JSON has been at least one of the marquees in each Postgres release. JSON arrived with much hype, and JSONB fulfilled on the initial hype of Postgres starting to truly compete as a document database. JSONB only continues to become more powerful with better libraries for taking advantage of it, and it’s functions improving with each release.

If you’re doing anything with JSON or playing with another document database and ignoring JSONB you’re missing out, of course don’t forget the GIN and GiST indexes to really get the benefits of it.

The year ahead

Postgres 9.5/9.6 should continue to improve and bring many new features in the years ahead, what’s your preference for something that doesn’t exist yet but you do want to see land in Postgres. Let me know @craigkerstiens

Postgres 9.5 - The feature rundown

Sun, 27 Dec 2015 12:55:56 -0800

The headline of Postgres 9.5 is undoubtedly: Insert… on conflict do nothing/update or more commonly known as Upsert or Merge. This removes one of the last remaining features which other databases had over Postgres. Sure we’ll take a look at it, but first let’s browse through some of the other features you can look forward to when Postgres 9.5 lands:

Grouping sets, cube, rollup

Pivoting in Postgres has sort of been possible as has rolling up data, but it required you to know what those values and what you were projecting to, to be known. With the new functionality to allow you to group various sets together rollups as you’d normally expect to do in something like Excel become trivial.

So now instead you simply add the grouping type just as you would on a normal group by:

SELECT department, role, gender, count(*)
FROM employees
GROUP BY your_grouping_type_here;

By simply selecting the type of rollup you want to do Postgres will do the hard work for you. Let’s take a look at the given example of department, role, gender:

grouping sets will project out the count for each specific key. As a result you’d get each department key, with other keys as null, and the count for each that met that department.
cube will give you the same values as above, but also the rollups of every individual combination. So in addition to the total for each department, you’d get breakups by the department and gender, and department and role, and department and role and gender.
rollup will give you a slightly similar version to cube but only give you the detailed groupings in the order they’re presented. So if you specified roll (department, role, gender) you’d have no rollup for department and gender alone.

Check the what’s new wiki for a bit more clarity on examples and output

Import foreign schemas

I only use foreign tables about once a month, but when I do use them they’ve inevitably saved many hours of creating a one off ETL process. Even still the effort to setup new foreign tables has shown a bit of their infancy in Postgres. Now once you’ve setup your foreign database, you can import the schema, either all of it or specific tables you prefer.

It’s as simple as:

IMPORT FOREIGN SCHEMA public
FROM SERVER some_other_db INTO reference_to_other_db;

pg_rewind

If you’re managing your own Postgres instance for some reason and running HA, pg_rewind could become especially handy. Typically to spin up replication you have to first download the physical, also known as base, backup. Then you have to replay the Write-Ahead-Log or WAL–so it’s up to date then you actually flip on replication.

Typically with databases when you fail over you shoot the other node in the head or STONITH. This means just get rid of it, completely throw it out. This is still a good practice, so bring it offline, make it inactive, but from there now you could then flip it into a mode and use pg_rewind. This could save you pulling down lots and lots of data to get a replica back up once you have failed over.

Upsert

Upsert of course will be the highlight of Postgres 9.5. I already talked about it some when it initially landed. The short of it is, if you’re inserting a record and there’s a conflict, you can choose to:

Do nothing
Do some form of update

Essentially this will let you have the typically experience of create or update that most frameworks provide but without a potential race condition of incorrect data.

JSONB pretty

There’s a few updates to JSONB. The one I’m most excited about is making JSONB output in psql read much more legibly.

If you’ve got a JSONB field just give it a try with:

SELECT jsonb_pretty(jsonb_column)
FROM foo;

Give it a try

Just in time for the new year the RC is ready and you can get hands on with it. Give it a try, and if there’s more you’d like to hear about Postgres please feel free to drop me a note craig.kerstiens@gmail.com.

Going from blog posts to full launches

Sat, 26 Dec 2015 12:55:56 -0800

I recall extremely early stage where you’d build a feature, realize it was awesome, then the next day write a blog post for it. At some point you start to move from that to more coordinated launches. A larger coordinated launch allows you to reach a bigger audience, can lead to bigger deals, and help expand your overall market. But perhaps more importantly by the time you hit full launch you’ve message tested and ensured it’s going to resonate in the way you expect.

The process itself will both help amplify and validate/refine your message

This is often a more gradual process than a sudden single change, you’ll introduce new parts of this in time. And for many what an entire launch process looks like comes by trial an error, to help shorten that learning curve here’s key areas I pay attention for a launch and process followed by a rough timeline.

Product first

Making sure the product is in the right shape is key to any big launch. You don’t get a second shot and if the product isn’t in shape customers often won’t take a second look at it later. For this reason I strongly prefer to have your product locked and loaded before you even start talking launch times, or at least be in the bug clean up phase. This means you’ve built a feature, validated with alpha users or private beta, and are ready to open it up to the world.

If you have to set a launch date without the product or feature being already done allow padding. Sometimes it’s good for the team to know the padding, sometimes it isn’t. When you have extra time it’s not uncommon for your development to magically consume exactly that amount of time and still result in a small scramble towards the end.

A good driver I’ve found is needing to have it fully like to demo a few weeks out from the launch itself, such as during analyst pre-briefings.

Crafting your message

Every launch is an opportunity to tell your core message and value prop. If you miss this opportunity for focusing on a single narrow feature you’ve missed the biggest opportunity you had in a launch. You can’t relaunch your full product every time though–you do need some big improvements or feature that you can highlight, but you should still hit your core message.

First you should know that your feature solves some specific problem, you should know this from the alpha/beta testing and if it doesn’t solve this problem you’re not ready to launch. Yes, some people will launch a product before the product is completely there–this is common in a marketing driven company as opposed to a product/engineering driven company.

Your message should lead with the problem you’re solving, not the laundry list of features. The best launches lead with some broader thematic message, even better if it’s an altruistic world changing one. A rough example of this:

To the point to the product, and probably over generalized as boring:

Connectify brings a new way of taking your dumb devices at home and turning them into intelligent connected devices.

In contrast, broad thematic message, followed lightly by the product.

We live in a connected world, and with new connected devices there’s the opportunity not just give you more data but help you improve how you live your life. Connectify helps you at bringing the devices that matter together with ease.

Testing your message

You should treat your message just like a product, testing it gradually along the way. Once you’ve got some initial framing of it, test it internally, then with friendly customers or community members. Leading up to the launch I usually have a timeline and get all the content and communication rolling about 3 weeks out. I’ll give a bit of a timeline below but first some more around message testing

Analysts

If you regularly use any analysts you should absolutely use them to help with a launch. Several weeks out is a great time to test key messages with them, get feedback, and if you’re lucky you may even get them to provide a quote for the launch.

Keep in mind here a inquiry is an opportunity to test your message and get feedback. You should talk roughly half of the time here, they should be talking the other half. In contrast briefings before launch you should have your message fully baked and should simply be pitching your message and possibly demo-ing.

Friendlies

This may be more contentious, but at least at early stage sharing drafts with friendly community members is a great way to get feedback and refine your message. Here you should be especially concious of the request of their time and expect to have some delay before they get back to you. Being top secret about your message ahead of time won’t add much value to it being a home run, where as better ensuring it resonates will help it to be more successful.

Customers

Customers I call out as a separate bucket. Customers have less incentive to leak your news than friendlies, but also fall somewhere on the other spectrum of analysts. A key piece about customers is there is an opportunity for them to be a launch partner. And so on to that topic:

Launch partners

Press and others like seeing and knowing you have external validation. Similarly many see the benefit of being part of a launch, after all it’s more free press for them. For a launch partner there are various levels, though for most providing some quote is a pretty common level.

The best way to do this is talk to them about what they like about the feature/product and take a first stab at the quote for them from their feedback. Some may very much want to wordsmith their own which is fine, but minimizing the work required of them while–trying to hit something they’d say as well–as a message that flows can best be done by you taking a first pass.

Further there’s varying levels of value with quotes and references. In descending order:

Customers
Analysts
Community Members

The other details

During launch week I mostly want to be dotting i’s and crossing t’s, meaning: I want the product done. I want documentation done. I want the blog post finalized. I want to be in the mode of send internal announcements, prep internal teams, talk to media.

Prepping internal teams

Obviously the engineering and product people involved will be in the loop. But you need to notify many others some of which should have been in the loop already, some less so:

Support - There’s a new product surface area, support should be top of your list so they can field the tickets and questions that come in
Sales - Even if there is no price change or impact, new features allow sales to communicate value to customers

Timeline

Finally what’s the end to end timeline look like with all the little details. Here’s a rough one that’s fully built out. IF you’re smaller and don’t have a regular cadence of analysts in hand then just expect that doesn’t apply. IF your support team is the product and engineers maybe that’s lighter weight. Basically feel free to take out parts, but expect your process to grow to something of this size.

4 weeks out - Outline of blog post with key messages
Test that outline internally
3 weeks out - Start to get a rough draft in place
3 weeks out - Share internally and with friendlies. At this point you’re explicitely looking for message feedback. Tell people to not waste time on nitpicks of words or grammar, it will be 98% re-written by the time you’re done
2.5 weeks out - Analyst inquiries for message testing
2.5 weeks out - Start putting together product demo
2 weeks out - Start putting together documentation
2 weeks out - Start nailing down blog post for final messages
1 weeks out - Start to put final touches on blog post for grammar
1 week out - Analyst briefings
1 week out - Update support
3-5 days out - Stage blog post
3-5 days out - Stage new documentation
2-4 days out - Make sure PRs are ready or feature flags, in short the switch is there or live but not public
1-3 days out - Update sales
1-3 days out - Interall communication to all@
1-3 days out - Media briefings
LAUNCH DAY - Sit in a room and watch all the things, engage with twitter/HN/etc.

Postgres and Node - Hands on using Postgres as a Document Store with MassiveJS

Tue, 08 Dec 2015 12:55:56 -0800

JSONB in Postgres is absolutely awesome, but it’s taken a little while for libraries to come around to make it as useful as would be ideal. For those not following along with Postgres lately, here’s the quick catchup for it as a NoSQL database.

In Postgres 8.3 over 5 years ago Postgres received hstore a key/value store directly in Postgres. It’s big limitation was it was only for text
In the years after it got GIN and GiST indexes to make queries over hstore extremely fast indexing the entire collection
In Postgres 9.2 we got JSON… sort of. Really this way only text validation, but allowed us to create some functional indexes which were still nice.
In Postgres 9.4 we got JSONB - the B stands for Better according to @leinweber. Essentially this is a full binary JSON on disk, which can perform as fast as other NoSQL databases using JSON.

This is all great, but when it comes to using JSON you need a library that plays well here. As you might have guessed it from my previous post this is where MassiveJS comes in. Most ORMs take a more legacy approach to how they work with the database, in contrast the other side of the world believes in document only storage way is the future. In contrast Postgres believes there is a time and place for everything, just like Massive, except it believes Postgres is the path just as I do.

Alright, enough context, let’s take a look.

Getting all setup

First go ahead and create a database, let’s call it massive, and then let’s connect to it and create our example table:

$ createdb massive
$ psql massive
# create table posts (id serial primary key, body jsonb);

Now that we’ve got our database setup let’s seed it with some data. If you want you can simple hop over to the github repo and pull it down then run node load_json.js to load the example data. A quick look at it, given an example.json file we’re going to iterate over it. For each record in there, we’re going to call saveDoc. Based on our table which has a unique id key and a body jsonb field it’ll simply save our JSON document into that table:

var parsedJSON = require('./example.json');
for(i = 0; i < parsedJSON.posts.length; i++) {
db.saveDoc("posts", parsedJSON.posts[1], function(err,res){});
};

If you want to just take a look at this github repo, once you create a database you can run node load_json.js to seed it.

Why JSON at all?

JSON data is all over the place, in many cases it’s fast and flexible and allows you to move more quickly. Yes, much of the time normalizing your data can be useful, but there is something to be said for expediency saving some data and querying across it. Querying across some giant document also used to be much more expensive, but now with JSONB and it’s indexes that can be extremely fast.

Querying

So how do we go about querying? Well it’s pretty simple with Massive, they provide a nice findDoc function to let you just search for contents of a specific key within the document. Let’s say I wanted to pull back all posts that are in the Postgres category, well it’s as simple as:

db.posts.findDoc({title : 'Postgres'}, function(err,docs){
console.log(docs);
});

The real beauty of this is if you added a GIN index (which will index the entire document) this query will be quite performant.

Just make sure to add your GIN index:

CREATE INDEX idx_posts ON posts USING GIN(body jsonb_path_ops);
CREATE INDEX idx_posts_search ON posts USING GIN(search);

But even better, with Massive it’ll automatically add these for you if you just start saving docs. It will automatically create the table and appropriate indexes, just doing the correct thing out of the box.

Full text and JSON

Cool, so you can do an exact look up. Which is great when you’re matching a category… which could be easily normalized. It’s great when you’re matching numbers, which also could likely reside in their own column. But what about when you’re searching over a large document, or a set of keys within some document which may require several joins, or indeterminate data structure, well you may want to search for the presence of that string at all. As you may have guessed this is quite trivial.

db.posts.searchDoc({
keys : ["title", "category"],
term : ["Postgres"]
}, function(err,docs){
console.log(docs);
})

Hopefully it’s pretty straight forward, but to be very clear. Call out the document table you want to search, then the keys you’ll want to include in the search, then the term. This will search for any place the contents that string are found in matching values for those keys.

Which will nicely yield the expected documents:

[ { link: 'http://www.craigkerstiens.com/2015/05/08/upsert-lands-in-postgres-9.5/',
title: 'Upsert Lands in PostgreSQL 9.5 – a First Look',
category: 'Postgres',
comments: [ [Object] ],
id: 2 },
{ link: 'http://www.craigkerstiens.com/2015/11/30/massive-node-postgres-an-overview-and-intro/',
title: 'Node, Postgres, MassiveJS - a Better Database Experience',
id: 3 } ]

In conclusion

While Massive isn’t perfect, its approach to storing queries in files, using the schema vs. having to define your models in code and the database, and it’s smooth document integration makes it a real contender as a better database library when working with Node. Give it a try and let me know your thoughts.

Node, Postgres, MassiveJS - A better database experience

Mon, 30 Nov 2015 12:55:56 -0800

First some background–I’ve always had a bit of a love hate relationship with ORMs. ORMs are great for basic crud applications, which inevitably happens at some point for an app. The main two problems I have with ORMs is:

They treat all databases as equal (yes, this is a little overgeneralized but typically true). They claim to do this for database portability, but in reality an app still can’t just up and move from one to another.
They don’t handle complex queries well at all. As someone that sees SQL as a very powerful language, taking away all the power just leaves me with pain.

Of course these aren’t the only issues with them, just the two ones I personally run into over and over.

In some playing with Node I was optimistic to explore Massive.JS as it seems to buck the trend of just imitating all other ORMs. My initial results–it makes me want to do more with Node just for this library. After all the power of a language is the ecosystem of libraries around it, not just the core language. So let’s take a quick tour through with a few highlights of what makes it really great.

Getting setup

Without further adieu here’s a quick tour around it.

First let’s pull down the example database from PostgresGuide

Then let’s setup out Node app:

$ npm init
$ npm install massive --save

Connecting and querying

Now let’s try to connect and say query a user from within our database. Create the following as an index.js file, then run with node index.js:

var massive = require("massive");
var connectionString = "postgres://ckerstiens:@localhost/example";
var db = massive.connectSync({connectionString : connectionString});
db.users.find(1, function(err,res){
console.log(res);
});

Upon first run if you’re like me and use the PostgresGuide example database (which I now need to go back and tidy up), you’ll get the following:

db.users.find(1, function(err,res){
^
TypeError: Cannot read property 'find' of undefined

I can’t describe how awesome it is to see this. What’s happening is when Massive loads up it’s connecting to your database, checking what tables you have. In this case though because we don’t have a proper primary key defined it doesn’t load them. It could treat id as some magical field of course like Rails used to and ignore the need for an index, but instead it not only encourages a good database design but requires it.

So let’s go back and create our index in our database:

$ psql example
$ alter table users add primary key (id);

Alright now let’s run our script again with node index.js and see what we have:

{ id: 1,
email: 'john.doe@gmail.com',
created_at: Thu Sep 24 2015 03:42:52 GMT-0700 (PDT),
deleted_at: null }

Perfect! Now we’re all connected and it even queried our database for us. Now let’s take a few more look at some of the operators.

Running an arbitrary query

db.run will let me run any arbritrary SQL. An example such as db.run("select 'hello'") will produce [ { ‘?column?’: ‘hello’ } ].

This makes it nice and easier for us to break out of the standard ORM model and just run SQL.

Find for quick look ups

Similar to so many other database tools find will offer you the most common quick look ups:

$ db.users.find({email: 'jane.doe@gmail.com'}, function(err, res){console.log(res)});
$ db.users.find({'created_at >': '2015-09-24'}, function(err, res){console.log(res)});

And of course there’s a where operator for multiple conditions.

Structuring queries in your application

While in the next post I’ll dig in deep to JSON, this is perhaps my favorite feature of Massive… It’s design for pulling out queries into individudal SQL files. Simply create a db folder and put your SQL in there. Let’s take the most basic example of our user email lookup and put it in user_lookup.sql

SELECT *
FROM users
WHERE email = $1

Now back in our application we can run this and pass in a parameter to it:

db.user_lookup(['jane.doe@gmail.com'], function(err,res){
console.log(res);
});

This separation of our queries from our code makes it easier to track them, view diffs, and even more so create very readable SQL.

Up next

So sure, you can connect to a database, you can query some things. There were a couple of small but more novel things that we blew through in here. First is the fact I didn’t have to define all my schema, it just knew it as it really should. The separation of SQL queries you’ll custom write into files is simple, but will make for much more maintainable applications over the long term. And best of all is the JSON support, which I’ll get to soon…

Seeding a sharing-economy or platform company

Fri, 02 Oct 2015 12:55:56 -0800

These days if you’re creating a company you likely hope to accomplish more with less people, two ways of doing this fall to: The sharing economy and creating a platform. It’s easy to see the case for this when you have such unicorns like AirBnB or Uber. The opportunity for each of those to compete against hotel chains or taxi services which each need to manage their own inventory is incredibly exciting and revolutionary. In a similar fashion platforms can offer much the same, Heroku’s platform and marketplace made it easier than ever for developers to click a button and get everything they needed years ago. It’s not just their code, it’s everything from Postgres to Mongo to Logging. Or take the app store as example. Smart phones weren’t a new thing when the iPhone came out, but it was only the saviest of users that had apps installed on their windows smartphone or blackberry. The app store made the iPhone different than any other phone by allowing others to build and improve it, turning the iPhone not into a phone but a platform.

Platforms and the sharing economy both let you get further than having to take on the costs of offering the equivilent all on your own. And while a great idea to venture into one of these two areas, starting them isn’t as trivial as simply deciding to. For both of these you have issues with having a two sided market, first you have to convince the providers to come along, then the customers or vice versa. As a result of this two sided market issue the easiest way to actually start is by bootstrapping it yourself – or faking it til you make it.

What are some good examples of faking this? I’m sure you can probably find some good stories going back about AirBnB or Uber, but let’s assume times were different then. Let’s take a look at a very recent example: Lugg which just launched in the latest batch of YC. Lugg is Uber for moving essentially, allowing you to on-demand request furniture moved from one place to another. Early on Lugg built their app, then waited for requests to come in, then the founders got in a truck and moved the furniture themselves. As a customer the founders are likely providing a great experience, without ever having to tip their hat at the ways their hacking the impression of being a large well oiled machine.

But what about a platform? Slack continues to grow like wildfire as the new medium for communication. These days there’s endless integrations for slack, and I expect they’ll continue to expand what a platform for communication looks like. But a year ago they were quite a ways from having people show up at their door to add an integration. Sure there were people using them, but to expect github/trello/asana to immediately build an integration for every new flavor of the week tool would be crazy. Yet, without these integrations slack wouldn’t be nearly as useful as it is today–and probably wouldn’t have seen the growth it’s seen. In the early days of a platform the easiest way to get these integrations and partners in place is to show up and build the work yourself. Slack carried the weight early of building these integrations, much as Heroku add-ons showed up at partners offices and help write the code to get them as a provider in the marketplace. And while both Slack and Heroku are larger companies now, it still holds true for smaller ones starting today. Blockspring, a company which aims to make web services available through spreadsheets, had to do very much the same thing building their initial integrations themselves. Now with their rapidly growing user base and already large collection of APIs they may be able to shift the model, but early on that wasn’t so much an option.

If you want to build a platform, start by creating the impression of one while still carrying the load yourself. Yes, move to a true platform as soon as you can, but don’t wait for others to show up before you go that route.

A guide to analyst relations for startups

Sat, 25 Jul 2015 12:55:56 -0800

When it comes to go to market and marketing there’s lots of pieces in a toolchest that all work together. One that comes a bit later, but if used properly (much like a PR agency) can be valuable is industry analysts. And while working with a PR agency can quickly start to become clear. How to work with analysts so it is productive on both sides can take a bit longer to figure out, or at least it did for me. Even before you do start working with them there’s the question of if or when should you. Here’s hoping this primer makes it a bit faster and easier for others.

What is an analyst

Apologies to all analysts, but of all parts of this post I might butcher this one

Analysts talk to a lot of companies, both the ones making products as well as the ones purchasing them. I’m not actually sure what the spread is I’d guess 80-20. A large output of this and other activities is creating various reports and rankings. Gartner’s Magic Quadrant is probably the most well known industry ranking. Much of what they create isn’t freely available for consumption so you likely don’t see the sheer volume of insights they put out.

Why would you engage with an analyst

So what do they do for you? There’s really two major buckets:

Help with sales/marketing - Given they’re informing and influencing buying decisions of businesses they can be one more person on your side. If a launch in Techcrunch makes business foo aware of product bar, then an analyst report or ranking can help sway a decision on whether to try bar vs. baz.

Consulting - The other major opportunity is for the analyst to give some form of guidance. In a larger company when you already have an established product they should absolutely be part of your launch process (more on that in a future post). They’re actively following your market and space, hopefully just as you are to some extent. They can offer an outside perspective and help with broad areas of focus and messaging.

More - In reality it’s as clean cut as above. They may be able to introduce you to good candidates for hiring. They may be able to introduce you to a large company interested in acquiring some capability which you have. They may be able to connect you with investors. All of these things can and do happen, but the above buckets typically are the primary drivers.

When?

First, engaging with analysts should always come after you have some confidence in the product, after you’ve started some marketing drumbeat, and after sales. In short don’t be in a huge rush here, you’ll get there, but don’t be in too big of a rush. As you start to get some attention and momentum it’s just as likely they’ll engage with you first as you reaching out. Also, marketing != sales, more on that in a future post.

But, let’s assume you’ve got a product which targets business (Analysts aren’t just for tech companies, though you’ll see the benefit here sooner if you’re say a database company than a HR product). Let’s also assume you’ve got some sales and have some good launches under your belt. As it starts to come up in sales calls if you’re in any industry reports or rankings that may be an indicator, if you’re hearing about other competitors having more validation in such reports. As a general rule of thumb once you’ve got inhouse PR they should be able to help guide and steer to the right time.

So how do you engage with an analyst

If you’re engaging in some form of report or article, that should start to be pretty self explanatory. They’ll send you a questionnaire, you fill it out. You go back and forth a little bit.

However, the majority of my interactions aren’t on those articles and reports, for ever one time I fill out lots of questions to help some report or ranking I have 20 calls with an analyst.

There are two primary calls you can have, an inquiry and a briefing.

Inquiries

Inquiry is just a fancy word for consulting call. An inquiry you will always be paying for.

A small detour here. The regularity and consistency in which you engage with an analyst makes a difference. They’re also people at the end of the day, so while firms have certain styles it’s even further multiplied by being very people driven. In your interactions you’ll have a different rapport with different people, it’s at a minimum important to be aware of this.

So back to an inquiry. Within an inquiry your goal is to pull back the curtain and give some backstage insights into what you’re doing and where you’re headed. This is typically under NDA and trust the NDA of an analyst. It’s worthwhile to be as candid as you can here, yes it feels weird, but you’ll get the most value. They’re not like that of a reporter looking for a scoop (not that you can’t trust reporters, just know if you say it, it’s on record). You don’t have to relish the entire call to one area, but areas of coverage are often:

Upcoming products and major releases you’re working on
Broader strategy and roadmap
Get input on what they’re seeing and hearing from customers

Briefings

The other type of call we have is a briefing. This is a little similar to that of a press briefing. You’ll get on the call, and walk through some upcoming launch or just give an update on your company and progress. The latter is more common if they’re unfamiliar with you or your product.

Analyst briefings are good to do earlier than your press briefings, compared to press they’re like a bike with training wheels. It’s best if you still maintain your balance–the ride will be smoother, but there’s a little less risk of completely toppling over. One key difference is you often have a powerpoint deck you get to walk through during an analyst briefing. I’ve found this is helpful for pacing and key messages, I used to be skeptical, but now very much feel it’s always worth doing.

Pro-tip: You can create a deck and use it for press too, no they won’t want to get on a gotomeeting, but you can send it over so they have more content later. BUT, more importantly you can also walk through it on your own screen if it helps with pacing.

Within a briefing you’ll have some ability to ask them questions at points. Does this resonate? Are you hearing similar? What are you seeing in the market? Don’t turn it into an inquiry, but knowing the parts that hit home for them allow you to refine your pitch for the next call.

Engaging - the tactical parts

“Analysts are pretty much paid to talk and write” - @cote. So expect that often when you occupy their time there’s a price to it. In terms of finding them it should be pretty easy, to know the list of ones in your space, you may see them quoted or reference in various media outlets. You may just naturally crop up in a report.

If you create a regular relationship with them you’ll have some contract of hours over the course of a quarter or year. At an early stage company this is often owned and manage by whomever runs your PR from an internal perspective.

Conclusion

If you’re about to engage with analysts for the first time or haven’t figured out how to get the most out of your interactions I hope the broad overview is helpful. If there’s some glaring parts you feel I’ve missed let me know @craigkerstiens. And for further reading/watching I’d encourage checking out the great talk from @cote in the Heavybit library.

As far as take-aways and a recap:

Don’t be too eager to jump in with analysts. They can absolutely provide value, but you have to put some time in before it really starts to pay off. It’s not an overnight change and takes building a rapport with them.
At the same time, analysts can be useful in many B2B areas not just tech ones.
When in an inquiry be open and as transparent as possible.
Powerpoint/Keynote/Google presentations are useful in briefings, even if it’s just for you to follow along.

A guide to PR for startups

Tue, 21 Jul 2015 12:55:56 -0800

You’ve built your product and you’re now ready for your first major launch. Or you’ve been through a launch or two, but are looking to scale the process as you’re doing more launches and announcements. You really have two options: do it all on your own, or work with a PR agency. One frequent crossroad is that you’re not at the point of a full time PR person, but unsure what a PR agency can offer you; and, further what’s the best way to work with them so you’re getting the maximum value.

As I’ve talked to more startups lately, it’s become clear that effectively working with PR teams and the media is mostly learned by doing. Because there’s not much guidance out there, here’s an attempt at some basic guidelines.

On PR

First there’s two types here and they’re not mutually exclusive. In-house PR is a full time person or team that works within your company, here you’ll often have a pretty different experience. From my experience, in-house PR people tend to understand a company message and vision because they are living and breathing your company values every day.

The other alternative is hiring a PR agency. An agency will have several (sometimes hundreds!) of clients. The relationship that you’ll have with an agency is much different than in-house. You’ll use them just like you would a consultant or contractor. Most startups end up with the agency approach first, because of the perception of “more people working for a cheaper cost than hiring in-house.” However, it’s of note an agency doesn’t alleviate you of doing work, nor should you want them to handle all parts of it.

Messaging

An agency may offer to help with messaging, but take this somewhat lightly. I don’t doubt that some are very good at it, but in most cases I’ve found they don’t have the same amount of customer interaction as you as a founder or early employee would. Further, your vision of impact to the market and direction may be more distant than theirs. You should expect to own your messaging, just like you own your product.

Where they can heavily help is providing a lot of structured frameworks for helping you get to your messaging. Some pretty basic templates of standard questions for customers and partners can go along way in helping you actually uncover what they feel your value is.

On your key messaging/value prop, there’s two pieces I’ll drop in here. While I’d love to write another long post on it, I wonder when I’ll actually get it out. So the first is pitch the problem you’re trying to solve–Dave McClure talks about this as well as anyone. The second is don’t pitch features, pitch the use cases and solutions. Pitch what’s possible

Pitching

This is the number one area I’ve found that having PR makes a huge difference. In the world of reporting, different reporters have different beats (areas of coverage), styles, outreach preferences, and most importantly, different relationships with companies and people. Knowing all of this and how to pitch a story to them is key. Yes you can spend hours researching and creating a perfect story just for them, and do that again, and again and hopefully land some coverage. But I’d argue a bit: that’s not the best use of your time.

With a good PR person or agency you’ll be able to strike a mix of:

Here’s the outlets I want to be in and why (have a good reason for why).
Understanding the audience and readership.
What outlets you feel like your key customers are reading, and validate this with the agency.

From there, if you’ve found a good agency they already have relationships with your key journalists / publications. So if you have a compelling product, you just need to give them the right messaging of the particular launch or news.

What else to expect from your agency

A surprise for some is how the whole process works. The agency is going to be there on the phone with you. You’re not going to hang out over beers while pitching being chummy. The reporter is listening to multiple other pitches, it’s likely they had one right before you and right after. The agency is there listening, helping keep time and track of conversation for reporter fact-checking after the interview.

Hopefully they’re also keeping notes. They should be able to provide you with some high level notes of what message resonated with each reporter and what didn’t, what you covered, and what they asked. This is especially useful for future interactions.

Similarly you should get a briefing 1 pager ahead of time. You should be able to skim this, you don’t have to memorize. But it’ll include key things about recent articles written by the reporter, their beat, topics to dive into and ones to stay away from. If you can connect the dots, those notes from an initial call start to feed into the 1 pagers for future calls.

Onto the briefing

Of course it’s important to land the briefing in the first place, but just as important is getting it right. Coming into it, the reporter will have already gotten the high level pitch… It’s why they took the call. You’ll get a mixed bag of those that are open to teeing up the opportunity to those that want to get right to the news. Roll with what they prefer, but also don’t be afraid of trying to hit some of your key points.

Have your key messages ready

Sound bites help hugely here. Analogies, customer references, whatever you want to hit. Have it ready. Also if you’ve got a great sound bite that helps tell the story, it can make the reporter’s job easier. Just don’t swing too far into happy go lucky marketing land. It’s important to remember that you’re talking to a person. Have a conversation - don’t talk at them.

Go slow

It may seem obvious when you think about it, but as you’re talking the reporter is writing. Or at least you hope they are. Some do it by hand and type up notes late, some type right then and there. When you hear a pause it doesn’t always mean to keep going and it seldom means hurry up. Become extra comfortable with pauses. Check in if you’re going to fast, if they’re following, if they have any questions. I’ve had people bring me in a beer before because I’d had multiple cups of coffee through a few pitches, and they were trying to slow me down a bit. Know your pace, and then slow it down.

Questions

It’s okay if they don’t have a lot of questions, they may not. They may have none at all. Yes, pause, and give them a chance, or even ask if they have any. But don’t stress too much if they have no questions.

On the flip side of that - you’re PR person should have prepared a list of questions for you beforehand that the reporter could possibly throw your way. Be sure you’ve thought through and practiced all the Q&A scenarios before the interview so you aren’t caught off-guard when you’re in front of the reporter.

In conclusion

If it’s your first go around, don’t stress too much. Have the headlines you want in your mind and key messages, or better yet write them out. Personally I write key things on a whiteboard nice and large before I’m on the call. Finally once you’re all done, enjoy reading the coverage. But you’re not all done after you get some coverage look back, run a retrospective just like you would for a software project. What worked well, why did or didn’t something work. What can you improve next time.

*Full disclosure, this is based across interactions with a small sample size of different PR agencies and individuals. Mileage may differ heavily from PR firm to PR firm, but hopefully the above provides at least some roadmap for more clarity vs. flying blind. As always if you’ve got feedback/questions, feel free to let me know @craigkerstiens

Finally a special thanks to Paul Katsen for much of the inspiration on creating this post and to he and Katie Boysen for review

Moving past averages in SQL (Postgres) – Percentiles

Sun, 07 Jun 2015 12:55:56 -0800

Often when you’re tracking a metric for the first time you take a look at your average. For example what is your ARPU - Average Revenue Per User. In theory this tells you if you can acquire new user how much you’ll make off that user. Or maybe what’s your average life time value of a customer. Yet, many that are more familiar looking and extracting meaning from data median or a few different looks at percentiles can be much more meaningful.

And while you can very easily get the AVG in Postgres, with a small amount more effort you can report on percentiles as well. Window functions have been around for some time in Postgres. They allow you to order your result set over a certain group. The most basic example is if you want to order by date, but know which one falls at place 10 in order you can use a window function and project out the rank().

Beyond outputting the rank yourself and doing extra manipulation Postgres has some great utilities to make the most common uses even easier. Being able to compute things such as the perc 95 directly on the data, or lay out for every record in the result where it falls within a percentile is hugely useful. Let’s take a look:

Assuming you have a table called purchases, which has a total in it we could try:

SELECT id,
total,
ntile(100) OVER (ORDER BY total) AS perc_rank
FROM purchases

This would give us something like:

 id | total | perc_rank
----------|---------|-----------
264 | 12034 | 100
643 | 11830 | 100
...
...
304 | 751 | 95

What this would tell us is we have less than 5% of our purchases that have a total over 751. From here you can start to dig in and extract all sorts of different meanings, and by doing directly in SQL you’re closer to the data and have one less processing step.

Percentiles get even more fun with the ordered set functions that came out in Postgres 9.4. They even allow you to project out hypothetical values in certain cases. For now I’d encourage adding ntile to your toolbox anytime you’re analyzing average or medians it will make your world a bit better, and then consider exploring further on the ordered set functions

Upsert lands in PostgreSQL 9.5 – A first look

Fri, 08 May 2015 12:55:56 -0800

If you’ve followed anything I’ve written about Postgres, you know that I’m a fan. At the same time you know that there’s been one feature that so many other databases have, which Postgres lacks and it causes a huge amount of angst for not being in Postgres… Upsert. Well the day has come, it’s finally committed and will be available in Postgres 9.5.

Sure we’re still several months away from Postgres 9.5 being released, anywhere from 3-6 months as a best guess. That doesn’t mean we can’t take a first look at this feature. Though before we get into it a few special call outs of thanks to Peter Geoghegan of the Heroku Postgres team for being the primary author on it, Andres Freund who recently just joined Citus Data for his heavy contributions, and Heikki Linnakangas as well for his contributions.

And now onto the exploration. Upsert is the common name, but if you’re unfamiliar upsert is essentially create or update – Create this new record, but if a conflict exists update it. Let’s take a practical example.

Assume you have a web scraper that imports product information into a table. Each product has a UPC code, title, description, and link. There’s a unique constraint on the UPC code. Now, if your web scraper tries to insert a new product, and a product with the same UPC already exists, you’d usually get an error. But you don’t want the query to fail, you’d want to update the existing product instead. Maybe with a new image, maybe a new description, whatever have you, but I don’t want it to blow up… I simply want to capture the new data and save it.

So before: Insert a record… Exception this violates a unique constraint… Let your app figure out what to do. protip: often applications would try to work around this, but you can run a chance of a race condition and duplicate records if there’s a conflict. TLDR; it’s not a perfect solution.

Now: Insert a record… There’s a unique constraint violation… Okay, let’s just update all the new record’s fields inside a single transaction

So enough explanation, here’s how it actually looks in the syntax:

INSERT INTO products (
upc,
title,
description,
link)
VALUES (
123456789,
‘Figment #1 of 5’,
‘THE NEXT DISNEY ADVENTURE IS HERE - STARRING ONE OF DISNEY'S MOST POPULAR CHARACTERS! ’,
‘http://www.amazon.com/dp/B00KGJVRNE?tag=mypred-20’
)
ON CONFLICT DO UPDATE SET description=excluded.description;

It’s been a long time coming for this, and it makes building applications that need this kind of behavior even easier. While it would have been great for this to be available years ago, kudos to Postgres and its community for taking the approach that is safe for your data. The result we have now both provides the desired behavior of create or update, and is performant without the risk of race conditions for your data.

A product management blueprint

Wed, 18 Feb 2015 12:55:56 -0800

I find myself having more conversations with startups – both small and large – about product management. I’ve blogged about some of the tools in my chest here but I haven’t talked much about my “blueprint” for product management, which I find myself laying out in many conversations over coffee. What follows is this process I’ve used a few times over with new teams to get product and engineering moving together, shipping in a predictable manner, and tackling bigger and more strategic projects.

Trust

I need to know how to work with my team, what their working styles are, and how we interact. This starts by simply interacting – specifically, outside of the office. I heard a similar opinion recently from Chris Fry (who ran engineering at Salesforce and Twitter) when he remarked something to the effect of: “you can tell a good PM from a bad one based on if he goes to drinks with his team.” Without getting hung up on whether it’s beers or coffee, it’s more about socialization with your team and time outside the office. My personal approach: expect a dinner invite over to my place when I take on running product for a new team.

Velocity

Once you’ve started to build some rapport, it’s time to get down to business. If being able to quickly commit and ship something isn’t a problem for you, then it’s easy to just assume this is working. In reality most teams I encounter that need PM support don’t have shipping nailed down. You probably already know if you fall into that category of feeling like you can commit and ship vs. not, so if you’re not able to do that a few tips:

There’s some projects that everyone wants to ship that’s been tried over and over, don’t tackle that first.
Shipping something is better than nothing. It doesn’t have to be the right thing.
Sometimes you don’t have to ship something to get velocity, you can launch things you already have
~~Kill scope~~ Test things earlier and more iteratively, the more you can validate or try something without requiring a large investment the more everyone feels better about the direction you’re heading.

The key here is to commit to projects, deliver, and move on. Your velocity depends solely on delivery, not tasks, not sprints, not projects, etc. If you haven’t shipped anything in a year, then your velocity for the year is zero. At a later point you should move from the focus on shipping anything to shipping the right things, it’s more important to ship 1 thing that moves the needle than 10 that don’t, but that’s a later concern.

Killing things

On the note of killing scope… I’ve heard it articulated at times, that some engineers are happy when certain PMs show up because it means less work for them. When you go over to an engineer’s desk are you creating more or less work? The answer should be less some large percentage of the time. If you can find a way to accomplish your goals with less effort, it’s always a win. Every project everywhere always needs more time or money, what’s more innovative is how you can help a project to ship without one of those two.

At a broader perspective than just scope – one of the biggest ways product can help engineering is by pushing harder for killing off features and the scope of a product. There’s a good test on if something is ready to ship: if you tell beta users you’re killing it and they yell at you that you shouldn’t kill it, then it’s ready to ship.

If you’ve already shipped things, but they’re not delivering value or not being used, kill them. It’s that simple, it may have been a great idea at the time, but either invest in making sure it’s used or kill it so you don’t have to maintain it.

Roadmap planning

Usually getting velocity and killing things takes 3-6 months to really take full effect. At this point a team feels like they’re not under a pile of technical debt, and they can commit to shipping projects. This is the point when product and engineering are melding and you can really start to have fun about where you’re headed. At this point I’ve seen a huge mix of where engineers are more actively or less actively engaged in this process. And the reality is this is everyone’s job to be thinking about where you’re headed as a company – at least that’s the case for any company that classifies itself as a startup.

My favorite tool for this is a team gridding exercise, you can read more about this here and here. This is often best conducted at an off-site where you have an opportunity for casual conversation which can foster broader thinking beyond the obvious bug fixes or smaller product improvements.

One item of note I’ve heard from teams that have done this or similar exercises is they still have trouble deciding what to do after the fact. The role of product is to get to that decision. The most important part is getting to a decision and not the perfect one, gather data, decide, revisit as you go along. All of this isn’t to say that it’s an arbitrary decision, customers, data all inform that as well as the effort to impact matrix exercise, but in the end a clear direction isn’t executed on consensus.

In conclusion

There’s really no end or done when it comes to the role and the work.

There’s always another milestone and the market is always moving around you. But once you’re able to execute predictably and think in an ordered sense about your roadmap, you’re in a position to be able to monitor and adapt to the market, and even more so experiment and shape the market yourself. At that point you have to keep doing it and then the hard part becomes finding ways of keeping a fresh perspective protip: customers are an important part of that equation

Have tips/tricks/practices that I completely missed here or that you disagree with? I’m always happy to talk with others so drop me a note craig.kerstiens@gmail.com.

A simple guide for DB migrations

Wed, 01 Oct 2014 12:55:56 -0800

Most web applications will add/remove columns over time. This is extremely common early on and even mature applications will continue modifying their schemas with new columns. An all too common pitfall when adding new columns is setting a not null constraint in Postgres.

Not null constraints

What happens when you have a not null constraint on a table is it will re-write the entire table. Under the cover Postgres is really just an append only log. So when you update or delete data it’s really just writing new data. This means when you add a column with a new value it has to write a new record. If you do this requiring columns to not be null then you’re re-writing your entire table.

Where this becomes problematic for larger applications is it will hold a lock preventing you from writing new data during this time.

A better way

Of course you may want to not allow nulls and you may want to set a default value, the problem simply comes when you try to do this all at once. The safest approach at least in terms of uptime for your table -> data -> application is to break apart these steps.

Start by simply adding the column with allowing nulls but setting a default value
Run a background job that will go and retroactively update the new column to your default value
Add your not null constraint.

Yes it’s a few extra steps, but I can say from having walked through this with a number of developers and their apps it makes for a much smoother process for making changes to your apps.

My wishlist for Postgres 9.5

Fri, 15 Aug 2014 12:55:56 -0800

As I followed along with the 9.4 release of Postgres I had a few posts of things that I was excited about, some things that missed, and a bit of a wrap-up. I thought this year (year in the sense of PG releases) I’d jump the gun and lay out areas I’d love to see addressed in PostgreSQL 9.5. And here it goes:

Upsert

Merge/Upsert/Insert or Update whatever you want to call it this is still a huge wart that it doesn’t exist. There’s been a few implementations show up on mailing lists, and to the best of my understanding there’s been debate on if it’s performant enough or that some people would prefer another implementation or I don’t know what other excuse. The short is this really needs to happen, until that time you can always implement it with a CTE which can have a race condition.

Foreign Data Wrappers

There’s so much opportunity here, and this has easily been my favorite feature of the past 2-3 years in Postgres. Really any improvement is good here, but a hit list of a few valuable things:

Pushdown of conditions
Ability to accept a DSN to a utility function to create foreign user and tables.
Better security around creds of foreign tables
More out of the box FDWs

Stats/Analytics

Today there’s madlib for machine learning, and 9.4 got support for ordered set aggregates, but even still Postgres needs to keep moving forward here. PL-R and PL-Python can help a good bit as well, but having more out of the box functions for stats can continue to keep it at the front of the pack for a database that’s not only safe for your data, but powerful to do analysis with.

Multi-master

This is definitely more of a dream than not. Full multi-master replication would be amazing, and it’s getting closer to possible. The sad truth is even once it lands it will probably require a year of maturing, so even more reason for it to hopefully hit in 9.5

Logical Replication

The foundation made it in for 9.4 which is huge. This means we’ll probably see a good working out of the box logical replication in 9.5. For those less familiar this means the replication is SQL based vs. the binary WAL stream. This means things like using replication to upgrade across versions is possible. So not quite 0 downtime, but ~ a minute or two to upgrade versions. Even of large DBs.

An official GUI

Alright this one is probably a pipe dream. And to kick it off, no pgAdmin doesn’t cut it. A good end user tool for connecting/querying would be huge. Fortunately the ecosystem is improving here with JackDB (web based) and PG Commander (mac app), but these still aren’t discoverable enough for most users.

What do you want?

So there’s my wishlist, what’s yours for 9.5? Let me know - @craigkerstiens.

When to ship it, when to kill it

Wed, 13 Aug 2014 12:55:56 -0800

A few weeks ago at lunch I had the opportunity to catch up with a company in the current YC batch, building something very similar to dataclips. While we talked about a lot of things from what we’ve learned from dataclips, marketing, and other areas. One area we talked about was product and when to ship vs. when to kill things and I realized I hadn’t talked on my fairly simple but clear view on this publicly, so here it is.

A large credit to Adam Wiggins for giving this model early on in Heroku and his approach to shipping product.

A precursor to shipping

First a little background on shipping, in shipping something I’m going to assume you have some process of alpha/beta testing with users. This is actually fairly key, if you’re not testing it with users then well the rest of this is all moot. Alpha and beta testing is pretty simple, you need some early users. These can be friends, people within a network, or random users you select from. There’s different value to how you select these but that’s a topic for another time and place.

On to shipping

So how do you know it’s ready. The basic idea is super simple. Give it to some users in alpha/beta testing. Or start to roll it out following a one -> some -> many all principle (maybe to 5% or 10% of your userbase). Then take that brand new feature away.

There’s a couple of ways to do this as far as mechanics. If you’re in contact with users such as alpha/beta users that you were higher touch with just email them. Tell them you’re removing the feature, or if you want to approach it more softly ask them how much they’d miss it if it were gone tomorrow. If you’re rolling it out more broadly perhaps behind a feature flag, flip it off and watch for feedback.

Once you take the feature away or threaten to if you don’t have users with pitchforks almost immediately then it’s not ready to ship.

Go back to the drawing board and work more on it or simply kill it. As @james_heroku would say: “So you’re saying the reason to ship the shitty thing now is becase you’ve spent a lot of time on it?”. Stepping back it’s all logical, but all too often it’s not put in practice when shipping it.

Your metrics can lie

Relying on just seeing a user spend some time on the new feature can often be misleading vs. the above approach. There’s a great talk by Des Traynor over at intercom.io that hits on this in part, the basic premise in there is that users shifting time from feature X to Y doesn’t mean it was a success it just means they’re spending time on something different. In launching new things you want to increase the overall value of your product, not simply shift users focus to the new flavor of the week.

Scaling Organizations - Scribing

Mon, 14 Jul 2014 12:55:56 -0800

In the process of growing a company there’s several hurdles based on the size of the company. What worked at 5 doesn’t work at 20, what works at 20 doesn’t work at 50, and what worked at 50 doesn’t work at 150. There’s a lot of talk about two pizza teams and scaling development teams out there. One thing I haven’t seen quite enough of is details around scribing and documenting things.

Planning

At teams of 2 and 3 you get everyone in a room. Perhaps 1 person says what you’re going to do and you all rally around it, or maybe it’s a day of debate and persuasion from all sides.

In the end though you all leave, get heads down, but all know what goal you’re working towards. At a larger company planning doesn’t scale quite this way. I’ve seen roadmapping and planning done a variety of ways as companies scale, but most times the thing they miss for far too long is documenting what comes out of it. Many may produce some level of artifact, but a cohesive wrap-up is often missed. Such an artifact should be easily digestible within a couple minutes, but also deep enough to answer many of the initial questions raised by the high level pieces.

Meetings

Meetings are a smaller level item than broader planning, and tend to go without thorough note taking than higher level planning. With growth you’ll have more meetings, trust me you will. The more meetings you have the more likely you may miss one or two you’re interested in. Or perhaps its as simple as some team members being out. Summer is especially hard around this. For a team of 10 it’s not uncommon that you may go all summer with at least 1 person not in the meeting and often two.

Keeping those that miss the meeting well informed of what happened at it is critical as you scale. This is slightly less important at an extremely large company, though still valuable, but critical as you scale to larger. As you’re scaling things are changing faster, and context can more easily get lost.

So how do you improve this?

Some practical tips:

Have a set of running notes with someone consistently scribing is a great standard to set. If you missed a meeting you know where to go for it.
Recording who was and was not at the meeting can be incredibly valuable. I’ve heard statements “I said X at Y meeting”, the only problem with that statement is I wasn’t at Y meeting.
Not only recording the meeting notes, but explicitly calling out who’s not there can help to know if that information should be explicitly passed along vs. just missed.
Within your long running document have a summary to wrap it up. While scribing is great it can lead to not seeing the forest for the trees at times.

And a few from others:

Meetings need a purpose and an agenda. If I don’t know why I’m having a meeting, or what will be covered, I won’t go. If I’m organizing a meeting and can’t spare the time to produce an agenda and goal, I shouldn’t waste other people’s time with the meeting – @jacobian
Any meeting over about 15-20 isn’t a meeting, it’s a presentation (which is OK too but make it clear that it’s a download, not a discussion). – @jacobian

Email

If you aren’t aware I’m a big fan of email. Email is almost guaranteed that someone will at least open it (at least if its to them or a clear enough list). If you have something you want someone to read – email it. You can have a canonical wiki, or Trello board, or a variety of tools, but email will get more eyeballs than any of these. At the same time don’t email things that are already documented elsewhere.

Emails are great for highlighting the things people absolutely need to know about. Short and concise emails will also help to improve reach. Be careful to make these emails have a high ratio of information size to value. If you have a lot of extra follow on content send them somewhere else to read.

Finally don’t overuse email. If you’re sending the same thing every week people will become numb to this. Monitoring if your emails are being opened/responded to can help to know if you’re over-broadcasting.

Postgres and Connection Pooling

Thu, 22 May 2014 12:55:56 -0800

Connection pooling is quickly becoming one of the more frequent questions I hear. So here’s a primer on it. If there’s enough demand I’ll follow up a bit further with some detail on specific Postgres connection poolers and setting them up.

The basics

For those unfamiliar, a connection pool is a group of database connections sitting around that are waiting to be handed out and used. This means when a request comes in a connection is already there whether in your framework or some other pooling process, and then given to your application for that specific request or transaction. In contrast, without any connection pooling your application will have to reach out to your database to establish a connection. While in the most basic sense you may thinking connecting to a database is quick, often theres some overhead here. An example is SSL negotiation that may have to occur which means you’re looking at not 1-2 ms but often closer to 30-50.

The options

There’s really two major options when it comes to connection pooling:

Framework pooling
Standalone pooler
Persistent connections

Framework pooling

Today many modern application frameworks have at least some basic level of connection pooling. This means as your application server starts up it will create a pool of connections to use. It’s worth noting that while most modern frameworks have pooling, not all do, and further it may not be enabled by default.

If you’re using the Sequel ORM for Ruby or SQLAlchemy for Python you’re well covered here. Further Rails is in pretty good shape also, though you may want to configure the pool size. For Django it’s a bit of a mixed story. For some time Django did not have pooling at all. As of Django 1.6 you now have persistent connections by default and the ability to enable a pool.

Persistent connections

Persistent connections don’t offer all of the benefits of pooling, but can often work well enough. Persistent connections is the act of maintaining a connection to your database once it’s connected. In the case where you have overhead of 30-50 ms each time you connect this can be quite helpful. At the same time you’re limited to the number of things that can be interacting with your databases as you’re limited to 1 connection per entry point to your webserver.

Standalone pooling

Postgres can be a bit of a sore spot when it comes to handling a ton of connections. For Postgres each connection you have to your database assumes some overhead of memory. Casual observations have seen it be between 5 and 10 MB assuming some basic query workload. And even if you have the memory overhead on your Postgres instance there becomes a point where management of connections becomes a limiting factor, we’ve seen this somewhere in the hundreds. While framework level connection poolers can give some better performance and lengthen the time before you have to deal with something more complex if you’re successful that time may come.

A rule of thumb I’d use is if you have over 100 connections you want to look at something more robust

In this case that something more robust is a standalone pooler specifically for Postgres. A standalone pooler can be much more configurable overall letting you specify how it works for Postgres sessions, transactions, or statements. Further these are very specifically designed to work with Postgres handling a very large pool of connections without adding too much overhead. In contrast to the 5MB-ish standard connection to Postgres PG Bouncer has a 2kb per connection.

So once you’re at the point of needing one there’s really two options.

PG Bouncer

My short and sweet recomendation is towards PG Bouncer. Contrary to how it’s named PG Pool is a multi purpose tool that does a lot of things (pooling, load balancing, replication, more). PG Bouncer takes the philosophy of doing one thing and doing it extremely well. I tend to favor these types of tools, which is the same reason I lean towards WAL-E to help with Postgres replication.

Need more?

Need more guidance with setting up and running PGBouncer? Give this guide a look or try the pgbouncer buildpack if running on Heroku. If you’re still interested in a deeper guide let me know @craigkerstiens and I’ll work on getting it into the queue.

Finally, make sure to sign-up below to get updates on Postgres content and first access to training.

Personas, data science, k-means

Thu, 08 May 2014 12:55:56 -0800

If one of the industry lingo terms in title didn’t make your skin crawl a little then I need to try harder. At the same time you’ve probably heard someone use one of them in a non-trolling way in the last month. All three of these can often actually mean the same or similar things, it’s just people approach them differently from their world perspective.

Personas don’t have to be marketing only speak, and data science doesn’t have to be only for stats people. My goal here is to simply set a context for the rest of the meat which talks about how you can simply look at your data and let it surface things you may not have known.

Personas

I most commonly hear this term from “business people”. In fact not too long ago I recall interacting with someone that wanted to define personas for a company. They wanted to give them names, Joe and Mary. Joe is a father of 2, he works between 8 and 5, because he has to pick kids up from school, he’s always worked at fortune 100 companies. Mary is single, she’s a small business owner, she likes using tools instead of building things herself. If you think this is overly exaggerated on what you might expect that’s fair. Lets take a company I’m fond of Travis CI, if someone were to do this for them it might look like:

Enterprise QA developer
Startup full stack engineer
Open source contributor

While this is all fine and good, a name and what they do doesn’t help in the substantial way I’d like. Sure use personas if it helps you think about who you’re building the product for, but don’t expect customers to say yes I fit into only this bucket by trying to create classifications like this.

Let’s rephrase this to be super simple, groupings of people, no groupings of something that have a likely outcome based on some various inputs. Perhaps a better term for it is archtype

Data science

The application of math or statistics to learn something about your business. It doesn’t have to be big data, or NoSQL, simply the application of an algorithm to learn something. Extending it a bit, let’s assume it’s to do something actionable. This is a bit of a chicken and egg, because you can’t look at different data the same way everytime and have a valuable intrepretation. Sometimes it requires using several methods and examining the quality of the results. We can apply a little more clarity and judgement to ease this process though.

k-means

Alright onto the meat of what I was hoping to dig into here, well actually first a little more of a detour. Tracking key data for your business should be extremely clear. Hopefully you’re already doing this, if you’re not already tracking month over month growth then go implement it today. If you don’t know your lifetime value or attrition rate then get on those too. But if you do have that and still are unclear how to move the needle on some goal, maybe that goal is increasing lifetime value then we’re at the right place.

An extremely old algorithm for grouping things together and fairly commonly known in stats communities is k-means. It will group things together based on their likeness into some set, thats where the k comes from, of groups. It’s also known as an unsupervised clustering method, because you simply put the data in, and let it create these groupings for you. But why or how is it useful, you know you want to influence lifetime value so you should just find what makes people increase it and move that, well… we may be able to get there with k-means.

Practicality

Most commonly when you search for k-means you’ll find some image similar to the one at the top of the post. This image graphically represents the clustering and the center of those clusters. And while visually interesting doesn’t actually tell you how to act upon it. A clearer way is actually often by examing the clusters and whats common, this tells you how to actually treat that archtype differently.

In his book Data Smart John Foreman actually does a great job of laying this out in a pratical way. I’m particularly partial to his example also because it uses wine as an example. His example generates a variety of groupings, looking at the surrouding meta data its then possible to discover that:

Grouping 1 likes Pinot
Grouping 2 likes buying in bulk
Grouping 3 likes buying small volume
Grouping 4 likes bubbly

From here you can then start to get some idea of what you’d do with this. Perhaps you’d create a deal each month so that it appeals to all groups, or target them with different deals. Or maybe you’d simply not send an email to them if you didn’t have a deal that month. If course you could go more granular down into a recommendation engine to get a personalized recommendation for each customer, but for a lot of smaller apps/sites that’s simply not feasible.

So in this case the output would look less like the image at the top and more like a set of 4 groups, then a CSV of every user and which grouping they fall in. Yes, its a less sexy graph, but a much more applicable CSV or excel output.

In the end what we’ve really done is define personas or archtypes based on whats similar between customers vs. arbitrary perceptions we may come in with.

Whats next

Up next I’ll actually dig in on a real world example here. Alex over at HackDesign was kind enough to give me access to their data to create a more practical example of this. While I’m just now digging in, there should be a tangible example of this to follow.

Postgres Datatypes – The ones you're not using.

Wed, 07 May 2014 12:55:56 -0800

Postgres has a variety of datatypes, in fact quite a few more than most other databases. Most commonly applications take advantage of the standard ones – integers, text, numeric, etc. Almost every application needs these basic types, the rarer ones may be needed less frequently. And while not needed on every application when you do need them they can be an extremely handy. So without further ado let’s look at some of these rarer but awesome types.

hstore

Yes, I’ve talked about this one before, yet still not enough people are using it. Of this list of datatypes this is one that could also have benefit for most if not all applications.

Hstore is a key-value store directly within Postgres. This means you can easily add new keys and values (optionally), without haveing to run a migration to setup new columns. Further you can still get great performance by using Gin and GiST indexes with them, which automatically index all keys and values for hstore.

It’s of note that hstore is an extension and not enabled by default. If you want the ins and outs of getting hands on with it, give the article on Postgres Guide a read.

Range types

If there is ever a time where you have two columns in your database with one being a from, another being a to, you probably want to be using range types. Range types are just that a set of ranges. A super common use of them is when doing anything with calendaring. The place where they really become useful is in their ability to apply constraints on those ranges. This means you can make sure you don’t have overlapping time issues, and don’t have to rebuild heavy application logic to accomplish it.

Timestamp with Timezone

Timestamps are annoying, plain and simple. If you’ve re-invented handling different timezones within your application you’ve wasted plenty of time and likely done it wrong. If you’re using plain timestamps within your application further there’s a good chance they dont even mean what you think they mean. Timestamps with timezone or timestamptz automatically includes the timezone with the timestamp. This makes it easy to convert between timezones, know exactly what you’re dealing with, and will in short save you a ton of time. There’s seldom a case you shouldn’t be using these.

UUID

Integers as primary keys aren’t great. Sure if you’re running a small blog they work fine, but if you’re application has to scale to a large size then integers can create problems. First you can run out of them, second it can make other details such as sharding a little more annoying. At the same time they are super readable. However, using the actual UUID datatype and extension to automatically generate them can be incredibly handy if you have to scale an application.

Similar to hstore, there’s an extension that makes the UUID much more useful.

Binary JSON

This isn’t available yet, but will be in Postgres 9.4. Binary JSON is of course JSON directly within your database, but also lets you add Gin indexes directly onto JSON. This means a much simpler setup in not only inserting JSON, but having fast reads. If you want to learn a bit more about this, sign up to get notified of training regarding the upcoming PostgreSQL 9.4 release.

Money

Please don’t use this… The money datatype assumes a single currency type, and generally brings with it more caveats than simply using a numeric type.

It’s already been pointed out on twitter that I missed a few. To give a quick highlight of some others:

Arrays
Interval – time intervals, such as ‘1 hour’, ‘1 day’
ISN - should help for anything with products
Inet - Tracking IPs

In conclusion

What’d I miss? What are you’re favorite types? Let me know @craigkerstiens, or sign-up below to updates on Postgres content and first access to training.

What you need to know about April 7 and your security on the web.

Tue, 08 Apr 2014 12:55:56 -0800

Yahoo
Amazon.com
Netflix
Various banks
Many more

If you’re interested in more technical details you can follow along or on the Heroku blog.

The short of it is you, yes you as in everyone, should rotate your passwords once all websites are safe. For further details please continue reading.

What does the vulnerability mean

In this case it allowed an external party to acquire a moderate amount of data from some computer running your website. Extremely clear examples (such as shown on the right) highlight an example of random third parties easily acquiring most recently logged in Yahoo mail usernames and passwords.

The first step

The first step in resolving this is actually not a step required by you at all, unless you’re running a production website online. The first step requires the developers running the site to update their site so they are no longer vulnerable. This as available to happen as early as April 7, and many major sites were fully updated and again safe as of April 8.

Still area for concern

With security vulnerabilities there are two key things to consider. First is the vulnerability itself, second is whether its therotical or can be simply acted upon. Yes, there’s a range here. One of the most unfortunate pieces from talking to those that know about security is this was extremely trivial to act upon.

This is made even worse in that this vulnerability has existed for 2 years without many knowing about it, meaning people have had an ability to snoop and collect parts of your data for two years

What to do?

First things first, be extremely cautious with any major website you connect with anything important. Any account that you have a password and you care about the account you should cease logging into it until you know its safe. As of the morning of April 8 here is a list of sites that were safe and ones that were vulnerable. You can check any site today here.

Once it’s clear that a site you know is now updated and safe either via that list of the latter tool you should change your password. For the time that this has existed and ease of comprimising its safe to assume all of your internet passwords and data within those accounts could have been comprimised. This means any website you have logged into within the last two years you should change the password for. Changing your passwords limits anyone being able to access that again.

I am not a security expert or analyst, but have heavily interacted with many that are in dealing with this incident. This advice is high level intended at non technical experts, if you have any questions or feedback please let me know on twitter @craigkerstiens

Some non-traditional marketing tips

Mon, 31 Mar 2014 12:55:56 -0800

Marketing is generally unexciting to a ton of engineers, until it brings eyeballs which bring feedback and dollars. Marketing doesn’t have to always be cheesy campaigns or ads, it can often just be surfacing the things your customers actually do want to care about. My favorite type of marketing is when a service sells me on something at the exact time I want it. Here’s a few short tips on some non-traditional marketing that won’t seem sleezy but still can work quite well.

Email subscriptions to your blog

RSS is pretty dead, google went and killed it with google reader. Sure there’s some decent replacements if you’re really tied to it. In particular newsblur by @samuelclay is a great reader. But now days content emerges on twitter, fb, and ranking services, then later is discovered via search. Both of these work pretty well, but twitter is ephemeral for so many. Email still converts incredibly well, if people are abandoning rss but still care about your content give them the ability for it to be put right in front of their face via email.

Market in transactional emails

Have emails that include receipts? Account confirmations? General notices? No not a monthly newsletter! Transactional emails are obviously valuable to your users. Why not include a small call out to your latest announcement? Have a central hook that your emails can check from and simply include a small call to action within there.

Credit to @stevenbristol on strong business podcast for this one

Retarget to your existing users

In a similar vein of notifying your existing customers in transactional emails about news, you should be doing this all over the web. Retargeting is great to convert people once you’ve already got them on a landing page, but its also incredibly useful to get existing users to use a specific feature. If you track if they’ve never used a feature retargeting is a great way to make them aware of it, and once they’ve used it just count it as a conversion.

My favorite retargeting provider perfect audience makes this quite convenient as they allow a bit more control than most retargeting services

In conclusion

Marketing doesn’t have to be throwing your product and messaging in someones face, but you should make your users aware of it. The more engaged they are they more they’ll stick around and be happy about using you’re product, assuming you’ve built a good one. What are some of your favorite tips?

A year's look at Postgres

Wed, 26 Mar 2014 12:55:56 -0800

A couple years back I started more regularly blogging, though I’ve done this off and on before, this time I kept some regularity. A common theme started to emerge with some content on Postgres about once a month because most of what was out there was much more reference oriented. A bit after that I connected with petercooper, who runs quite a few weekly email newsletters. As someone thats been interested helping give others a good reason to create content the obvious idea of Postgres Weekly emerged.

Since then we’ve now had the newsletter running for over a year, helped surface quite a bit of content, and grown to over 5,000 subscribers. First if you’re not subscribed, then go subscribe now.

And if you need some inspiration or just want to reminisce with me… here’s a look back at a few highlights over the past year:

The inagural issue

Postgres: The Bits You Haven’t Found

A slide-deck from a presentation at Heroku’s Waza conference that highlights many of the more unknown and rare features within Postgres, including ‘WITH’, arrays, pub/sub, and hstore.

Open Source Release:postgresql-hll

Aggregate Knowledge released Postgres HyperLogLog, which is a new Postgres datatype hll that strikes a balance between HyperLogLog and a simple set. This data type solves the problem of calculating uniques for a given data set efficiently both in performance and storage.

The above is still one of my favorite extensions that most of the world doesn’t know about

How I Work with Postgres - Psql, My PostgreSQL Admin

A common question for anyone new or even experienced with Postgres is whats the best editor out there? Most when they are asking this are asking for a GUI editor, this post highlights much of the power in the CLI ‘psql’ editor.

A mix of notable entries

Issue 6 Dissecting PostgreSQL CVE-2013-1899

After the heavily publicized and very serious security vulnerability was patched last week Blackwing intelligence took the chance to dig in. Read more on the details of the vulnerability such as what damage can be done and the basics of how its exploitable.

Issue 16 Tom Lane Explains Query Planner video

Tom Lane, one of the major contributors to Postgres and on the Postgres core team, was in San Francisco last week and gave a talk at the SF Postgres Users Group. Here’s the video from the talk where Tom explains the innards of the PostgreSQL query planner. Whether you’re a noob or a knowledgable Postgres user this is a must watch.

Issue 35 Top 10 psql ‘\’ commands I use

Psql is incredibly powerful, but the list of options within it can be overwhelming. Heres a straight forward list of @selenamarie’s top 10 commands.

Issue 38 Everyday Postgres: Tuning a brand-new server - the 10 minute edition

After a fresh install, there are probably a few knobs you want to tweak on Postgres. If you’re new to doing this, it can be a bit overwhelming. Here’s a quick primer on tuning a brand new server to be more properly configured.

And the latest issue

Which highlights a wealth of information on jsonb, and a bit of various knowledge touching on cluster, recursive queries with CTEs, and range types.

In conclusion

What did you like? Any favorites I missed? What would you like to see more of? Let me know @craigkerstiens or at craig.kerstiens at gmail.com

PostgreSQL 9.4 - Looking up (with JSONB and logical decoding)

Mon, 24 Mar 2014 12:55:56 -0800

Just a few weeks back I wrote a article discussing many of the things that were likely to miss making the 9.4 PostgreSQL release. Since that post a few weeks ago the landscape has already changed, and much more for the positive.

The lesson here, is never count Postgres out. As Bruce discussed in a recent interview, Postgres is slow and steady, but much like the turtle can win the race.

So onto the actual features:

JSONB

JSON has existed for a while in Postgres. Though the JSON that exists today simply validates that your text is valid JSON, then goes on to store it in a text field. This is fine, but not overly performant. If you do need some flexibility of your schema and performance without much effort then hstore may already work for you today, you can of course read more on this in an old post comparing hstore to json.

But let’s assume you do want JSON and a full document store, which is perfectly reasonable. Your option today is still best with the JSON datatype. And if you’re retrieving full documents this is fine, however if you’re searching/filtering on values within those documents then you need to take advantage of some functional indexing. You can do this some of the built-in operators or with full JS in Postgres. This is a little more work, but also very possible to get good performance.

Finally, onto the perfect world, where JSON isn’t just text in your database. For some time there’s been a discussion around hstore and its future progress and of course the future of JSON in Postgres. These two worlds have finally heavily converged for PostgreSQL 9.4 giving you the best of both worlds. With what was known as hstore2, by The Russians under the covers, and collective efforts on JSONB (Binary representation of JSON) which included all the JSON interfaces you’d expect. We now have full document storage and awesome performance with little effort.

Digging in a little further, why does it matter that its a binary representation? Well under the covers building on the hstore functionality brings along some of the awesome index types in Postgres. Namely GIN and possibly in the future GIST. These indexes will automatically index all keys and values within a document, meaning you don’t have to manually create individual functional indexes. Oh and they’re fast and often small on disk as well.

Logical Decoding

Logical replication was another feature that I talked about that was likely missing. Here there isn’t the same positive news as JSONB, as there’s not a 100% usable feature available. Yet there is a big silver lining in it. Committed just over a week ago was logical decoding. This means that we can decode the WAL (Write-Ahead-Log) into logical changes. In layman’s terms this means something thats unreadable to anything but Postgres (and version dependent in cases) can be intrepretted to a series of INSERTs, UPDATEs, DELETEs, etc. With logical commands you could then start to get closer to cross version upgrades and eventually multi-master.

With this commit it doesn’t mean all the pieces are there in the core of Postgres today. What it does mean is the part thats required of the Postgres core is done. The rest of this, which includes sending the logical replication stream somewhere, and then having something apply it can be developed fully as an extension.

In Conclusion

Postgres 9.4 isn’t 100% complete yet, as the commitfest is still going on. You can follow along on the postgres hackers mailing list or on the commitfest app where you can follow specific patches or even chip in on reviewing. And of course I’ll do my best to continue to highlight useful features here and surface them on Postgres Weekly as well.

Tracking Month over Month Growth in SQL

Wed, 26 Feb 2014 12:55:56 -0800

In analyzing a business I commonly look at reports that have two lenses, one is by doing various cohort analysis. The other is that I look for Month over Month or Week over Week or some other X over X growth in terms of a percentage. This second form of looking at data is relevant when you’re in a SaaS business or essentially anythign that does recurring billing. In such a business focusing on your MRR and working on growing your MRR is how success can often be measured.

I’ll jump write in, first lets assume you have some method of querying your revenue. In this case you may have some basic query similar to:

SELECT date_trunc('month', mydate) as date,
sum(mymoney) as revenue
FROM foo
GROUP BY date
ORDER BY date ASC;

This should give you a nice clean result:

 date | revenue
------------------------+----------
2013-10-01 00:00:00+00 | 10000
2013-11-01 00:00:00+00 | 11000
2013-12-01 00:00:00+00 | 11500

Now this is great, but the first thing I want to do is start to see what my percentage growth month over month is. Surprise, surprise, I can do this directly in SQL. To do so I’ll use a window function and then use the lag function. According to the Postgres docs

lag(value any [, offset integer [, default any ]]) same type as value returns value evaluated at the row that is offset rows before the current row within the partition; if there is no such row, instead return default. Both offset and default are evaluated with respect to the current row. If omitted, offset defaults to 1 and default to null

Essentially it orders it based on the window function and then pulls in the value from the row before. So in action it looks something like:

SELECT date_trunc('month', mydate) as date,
sum(mymoney) as revenue,
lag(mymoney, 1) over w previous_month_revenue
FROM foo
WINDOW w as (order by date)
GROUP BY date
ORDER BY date ASC;

Combining to actually make it a bit more pretty (with some casting to a numeric and then formatting a bit) in terms of a percentage:

SELECT date_trunc('month', mydate) as date,
sum(mymoney) as revenue,
round((1.0 - (cast(mymoney as numeric) / lag(mymoney, 1) over w)) * 100, 1) myVal_growth
FROM foo
WINDOW w as (order by date)
GROUP BY date
ORDER BY date ASC;

And you finally get a nice clean output of your month over month growth directly in SQL:

 date | revenue | growth
------------------------+----------+--------
2013-10-01 00:00:00+00 | 10000 | null
2013-11-01 00:00:00+00 | 11000 | 10.0
2013-12-01 00:00:00+00 | 11500 | 4.5

PostgreSQL 9.4 - What I was hoping for

Tue, 25 Feb 2014 12:55:56 -0800

Theres no doubt that the 9.4 release of PostgreSQL will have some great improvements. However, for all of the improvements it delivering it had the promise of being perhaps the most impactful release of Postgres yet. Several of the features that would have given it my stamp of best release in at least 5 years are now already not making it and a few others are still on the border. Here’s a look at few of the things that were hoped for and not to be at least until another 18 months.

Upsert

Upsert, merge, whatever you want to call it, this is been a sore hole for sometime now. Essentially this is insert based on this ID or if that key already exists update other values. This was something being worked on pretty early on in this release, and throughout the process continuing to make progress. Yet as progress was made so were exteneded discussions about syntax, approach, etc. In the end two differing views on how it should be implemented have the patch still sitting there with other thoughts on an implementation but not code ready to commit.

At the same time I’ll acknowledge upsert as a hard problem to address. The locking and concurrency issues are non-trivial, but regardless of those having this in there mostly kills the final argument for anyone to chose MySQL.

Better JSON

JSON is Postgres is super flexible, powerful, and generally slow. Postgres does validation and some parsing of JSON, but without something like PLV8, or functional indexes you may not get great performance. This is because under the covers the JSON is represented as text and as a result many of the more powerful indexes that could lend benefit, such as GIN or GIST, simply don’t apply here.

As a related effort to this hstore, the key/value store, is working on being updated. This new support will add types and nesting making it much more usable overall. However the syntax and matching of how JSON functions isn’t guranteed to be part of it. The proposal and actually work is still there and not rejected yet, but looks heavily at risk. Backing a new binary representation of JSON with hstore 2 would deliver so many benefits further building upon the foundation of hstore, JSON, PLV8 that exists today for Postgres.

apt-get for your extensions

I’m almost not even sure where to start with this one. The notion within a Postgres community is that packaging for distros is super simple and extensions should just be packaged for them. Then there’s PGXN the Postgres extension network where you can download and compile and muck with annoying settings to get extensions to build. This proposal would have delivered a built in installer much like NPM or rubygems or PyPi and the ability for someone to simply say install extension from this centralized repository. No, it was setting out to solve the issue of having a single repository but would make it much easier for people to run one.

For all the awesome-ness that exists in extensions such as HyperLogLog, foreign data wrappers, madlib theres hundreds of other extensions that could be written and be valuable. They don’t even all require C, they could fully exist in JavaScript with PLV8. Yet I’m on the fence encouraging people to write such because if no one uses it then much of the point in the reusability of an extension is lost. Here’s hoping that there’s a change of opinion in the future that packaging is a solved problem and that creating an ecosystem for others to contribute to the Postgres world without knowing C is a positive thing.

Logical replication

When I first heard this might have some shot at making it in 9.4 I was shocked. This is something that while some may not take notice of I’ve felt pain of for many years. Logical replication means in short enabling upgrades across PostgreSQL versions without a dump and restore, but even more so laying the ground work for more complicated architectures like perhaps multi-master. Yes, even with logical replication in theres still plenty of work to do, but having the groundwork laid goes a long way. There are options for it today with third party tools, but the management of these is painful at best.

In conclusion

The positive of this one is that the building blocks are in and its continuing to make progress. Its just that we’ll have to wait about 18 months before the release of PostgreSQL 9.5 before its in our hands.

How I hack email

Fri, 07 Feb 2014 12:55:56 -0800

In a conversation with @alexbaldwin yesterday the topic of email came up, with each of us quickly diving into various observations, how its both awesome and a great form of communication/engagement, how most people still do it really bad. Alex has some good experience with it with hack design having over 100,000 subscribers. A tangent in an entirely unrelated meeting with @mschoening and others it was suggested instead of emailing a list to send out a ton of individual emails instead. Both of these reminded me that email is incredibly powerful, but taking advantage of its power has to be intentional.

This is not about ways to get to inbox 0 or better manage your inflow of emails. Rather its about how to get the maximum output out of emails that you send, or minimum output depending on what you prefer.

1 email to 100 vs. 100 emails to 1

This is perhaps my favorite approach to get more efficient feedback and also know how broad an impact something has. Most smaller companies or groups within a company have a mailing list thats all@yourcompany.com or ourgroup@mycompany.com. When people want to communicate out to the entire list its a great mechanism, however when you want feedback from the entire company its not a great mechanism.

The reason being is that most people will know how many are on that list and assume that someone else will pick it up. This concept is fairly common in physical settings known as the bystander effect, stating that individuals often do not offer up help to a victim when there are other bystanders preset.

Finally in certain situations you’ll want to hear the same thing 100 times. Hearing something once doesn’t represent how much others echo that. You’ll only see so many +1s on a thread, getting 100 individual responses ensure you get not only the breadth of responses but amplitude of them.

FWIW, I ran a test of this sending an email to essentially all@heroku, then an individualized email in a similar form. The one directly addressed to people received 5x response as well as more thorough responses in the same time frame

Scaling requests for input

The issue that typically exists with the above is that you don’t want 100 responses from 100 people most of the time. Most of the time you want feedback from 2 or 3, then feedback from 4 or 5, then smaller feedback or revision from the rest of that 100. This is actually how I craft blog posts, I start with broad messaging/theming. At that level there’s truly 100 different directions it could go, that kind of input it not helpful when I have to narrow it down to a single one. When collecting product/roadmap input it can be helpful. Knowing which of the two I’m aiming for is critical in deciding a method.

Being explicit about the before and the ask

On the note of crafting a blog post I do usually start with a request from 2 or 3 to get general direction. This takes the effect of, is this interesting? From here though theres still further refinement. The next phase is, does this flow, does it make sense? Here having a broader list is helpful so usually it’ll hit around 4 to 5 people. Finally I’ll revert to the 1 email to 100 people on a mailing list asking for grammar input because mine is crap. Here I don’t mind the bystander effect because I want people to intentionally filter so it works well.

The key at each step of the process is being extremely clear of whats already been done. With a blog post as an example… If I don’t explain the process of people having reviewed and set the goals and some consensus that it meets them, that several have been over it for flow, and that what I’m looking for now which is grammar feedback.

Circulating through people

Email and requests are a time burden on people. I commonly diversify and circle through a set of people. Much in the same way I reach out to people to have drinks or coffee every so often I am to not do the same person every week and only that person with the exception of my wife.

Having more of a rotating basis of getting through people increases their excited-ness to provide input. If I’m always going back to the same people they may feel slightly drained by my constant requests, and quite rightfully so. At the same time the input is good, but diversifying where you receive it gives a broader perspective.

Delayed sending

This is one that may be a little more obvious to people. But sending an email to slow down a thread, not seem over eager, or for whatever other reason you may have is hugely useful. There’s really two tools I look to here: 1. Boomerang and 2. Yesware. Both have slightly different benefits. Boomerang with a much simpler interface, Yesware better integration with Salesforce. Regardless of which you choose, if you ever want to type and email but send it at some point later one of these is critical.

Fin.

While this list is less of a defined process and more of a collection of random processes, several of these I’d be much less effective without, and the collection of all makes getting appropriate reactions from email incredibly useful. I’d love to hear what hacks you use to elicit positive impact from the emails you receive, as always if you have feedback please drop me a note.

Examining Postgres 9.4 - A first look

Sun, 02 Feb 2014 12:55:56 -0800

PostgreSQL is currently entering its final commit fest. While its still going, which means there could still be more great features to come, we can start to take a look at what you can expect from it now. This release seems to bring a lot of minor increments versus some bigger highlights of previous ones. At the same time there’s still a lot on the bubble that may or may not make it which could entirely change the shape of this one. For a peek back of some of the past ones:

Highlights of 9.2

Highlights of 9.3

On to 9.4

With 9.4 instead of a simply list lets dive into a little deeper to the more noticable one.

pg_prewarm

I’ll lead with one that those who need it should see huge gains (read larger apps that have a read replica they eventually may fail over to). Pg_prewarm will pre-warm your cache by loading data into memory. You may be interested in running pg_prewarm before bringing up a new Postgres DB or on a replica to keep it fresh.

Why it matters - If you have a read replica it won’t have the same cache as the leader. This can work great as you can send queries to it and it’ll optimize its own cache. However, if you’re using it as a failover when you do have to failover you’ll be running in a degraded mode while your cache warms up. Running pg_pregwarm against it on a periodic basis will make the experience when you do failover a much better one.

Refresh materialized view concurrently

Materialized views just came into Postgres in 9.3. The problem with them is they were largely unusable. This was because they 1. Didn’t auto-refresh and 2. When you did refresh them it would lock the table while it ran the refresh making it unreadable during that time.

Materialized views are often most helpful on large reporting tables that can take some time to generate. Often such a query can take 10-30 minutes or even more to run. If you’re unable to access said view during that time it greatly dampens their usefulness. Now running REFRESH MATERIALIZED VIEW CONCURRENTLY foo will regenerate it in the background so long as you have a unique index for the view.

Ordered Set Aggregates

I’m almost not really sure where to begin with this, the name itself almost makes me not want to take advantage. That said what this enables is if a few really awesome things you could do before that would require a few extra steps.

While there’s plenty of aggregate functions in postgres getting something like percentile 95 or percentile 99 takes a little more effort. First you must order the entire set, then re-iterate over it to find the position you want. This is something I’ve commonly done by using a window function coupled with a CTE. Now its much easier:

SELECT percentile_disc(0.95)
WITHIN GROUP (ORDER BY response_time)
FROM pageviews;

In addition to varying percentile functions you can get quite a few others including:

Mode
percentile_disc
percentile_cont
rank
dense_rank

More to come

As I mentiend earlier the commit fest is still ongoing this means some things are still in flight. Here’s a few that still offer some huge promise but haven’t been committed yet:

Insert on duplicate key or better known as Upsert
HStore 2 - various improvements to HStore
JSONB - Binary format of JSON built on top of HStore
Logical replication - this one looks like some pieces will make it, but not a wholey usable implementation.

Where to go with developer content

Tue, 28 Jan 2014 12:55:56 -0800

Last week I wrote up some initial steps for getting started with marketing a developer focused product. The short of it was quite trying to do “marketing” and just start putting out interesting material. A big part of this is sourcing material from your company’s developers. From there you want to gradually shift it from simply interesting technical posts to things that align with your core beliefs and add value to your customers.

Perhaps the easiest way to do this is by highlighting some examples of it.

Teach them how to

Tindie is a marketplace focused on makers. Browsing their site is simply awesome, there’s everything from fully built things to raw supplies to let me start hacking. The biggest problem though is they don’t tell me how to take advantage of so much on their site. Posts similar to New Relic’s on how they made their awesome conference badges with a ready made shopping list of components would both get me excited and teach me something I didn’t know how to do prior.

Now a lot of this may seem obvious, but its not just about giving a how to. This doesn’t belong in a readme or in product documentation. Instead the activity of regularly crafting relevant stories that stretch how people think about hardware hacking should be a top of mind focus. It also positions you as a thought leader within the space. Right now there is no thought leader for makers, and theres ample opportunity to be that.

Timely content

Chipmaker Spark.io recently hugely capitalized on the Nest acquisition by writing a post only days after of how you can build an open source Nest for $70. I suspect they didn’t have such a post just lying around waiting for the acquisition and instead scrambled to get it all together almost as soon as it occurred.

Over time the opportunity will always present itself in some form to attach yourself to another story. Sometimes this can be related to a direct competitor, sometimes its simply tangential. Being willing to quickly invest time when an opportunity presents itself is key to taking advantage of those opportunities. But please don’t let such opportunities be your only way of capturing attention, there should still be a steady beat and focus.

Let your beliefs come out

Nearly everytime I sit down with some founder or very early employee at a company the vibe and impression I get from them is an order of magnitude stronger than the company’s public persona. At the root of every company trying to do something big is an acute focus on a problem with strong opinions about how to solve them. You don’t win people over by giving middle of the road opinions.

Heroku’s often been an example of being extremely opinionated. For a long time you found bits of this within our product such as with an ephermal filesystem – which in the long term enables scalability. Or with directing the separation of code and config – which helps reproducability for when things go wrong and spinning up new copies of your app.

Again the biggest problem with this opinionation wasn’t that it existed, but that it wasn’t talked about clearly or loudly enough. Its now much clearer and broader in the form of 12 Factor which fully codifies those strong opinions which influence the product, but also has applicability outside of Heroku.

All of the approaches

Doing just one of the above really isn’t enough. Having multiple types of content such as the above three allow you to be much more effective. Of course the way you manage them and distribute them changes based on the type of content, but more on that later.

Rethinking the limits on relational databases

Fri, 24 Jan 2014 12:55:56 -0800

Theres a lot of back and forth on NoSQL databases. The unfortunate part with all the back and forth and unclear definitions of NoSQL is that many of the valuable learnings are lost. This post isn’t about the differences in NoSQL definitions, but rather some of the huge benefits that do exist in whats often grouped into the schema-less world that could easily be applied to the relational world.

Forget migrations

Perhaps the best thing about the idea of a schemaless database is that you can just push code and it works. Almost exactly five years ago Heroku shipped git push heroku master letting you simply push code from git and it just work. CouchDB and MongoDB have done similar for databases… you don’t have to run CREATE TABLE or ALTER TABLE migrations before working with your database. There’s something wonderful about just building and shipping your application without worrying about migrations.

This is often viewed as a limitation of relational databases. Yet it doesn’t really have to. You see even in schema-less database the relationships are still there, its just you’re managing it at the application level. There’s no reason higher level frameworks or ORMs couldn’t handle the migration process. As it is today the process of adding a column to a relational database is quite straightforward in a sense where it doesn’t introduce downtime and is capable of letting the developer still move quickly its just not automatically baked in.

# Assuming a column thats referenced doesn't exist
# Automatically execute relevant bits in your ORM
# This isn't code meant for you to run
ALTER TABLE foo ADD COLUMN bar varchar(255); # This is near instant
# Set your default value in your ORM
UPDATE TABLE foo SET bar = 'DEFAULT VALUE' WHERE bar IS NULL;
ALTER TABLE foo ALTER COLUMN bar NOT NULL;

Having Rails/Django/(Framework of your choice) automatically notice the need for a column to exist and make appropriate modifications you could work with it the same way you would managing a document relation in your code. Sure this is a manual painful process today, but theres no reason this can’t be fully handled by PostgreSQL or directly within an ORM .

Documents

The other really strong case for the MongoDB/CouchDB camp is document storage. In this case I’m going to equate a document directly to a JSON object. JSON itself is a wonderfully simply model that works so well for portability, and having to convert it within your application layer is well just painful. Yes Postgres has a JSON datatype, and the JSON datatype is continuing to be adopted now by many other relational databases. I was shocked to hear that DB2 is getting support for JSON myself, while I expect improvements to come to it JSON was not at the top of my list.

And JSON does absolutely make sense as a data type within a column. But thats still a bit limiting as a full document store, what you want in those cases is any query result as a full JSON object. This is heavily undersold within Postgres that you can simply convert a full row to JSON with a single function - row_to_json.

Again having higher level frameworks take full advantage so that under the covers you can have your strongly typed tables, but a flexibility to map them to flexible JSON objects makes a great deal of sense here.

Out of the box interfaces

This isn’t a strict benefit of schema-less databases. Some schema-less databases have this more out of the box such as Couch where others less so. The concept of exposing a rest interface is not something new, and has been tried on top of relational databases a few times over. This is clearly something that does need to be delivered. The case for it is pretty clear, it reduces the work of people having to recreate admin screens and gives an easy onboarding process for noobs.

Unfortunately there’s not clear progress on this today for Postgres or other relational databases. In contrast other databases are delivering on this front often from day one :/

Where to

Some of the shifts in schema-less or really in other databases in general are not so large they cannot be subsummed into a broader option. At the same time there are some strong merits such as the ones above which do take an active effort to deliver on expanding what is a “relational database”.

Where to start with developer content

Tue, 21 Jan 2014 12:55:56 -0800

Getting the word out

Hacker News

When it comes to marketing, specifically to developers, the most common question is how do I get on Hacker News? The second most commong is, well in addition to there what matters. This fully depends on your audience, and if you really only care about the former versus the broader issue of creating a sustainable model for circulating your content then just read these links then move on. If you want a full model for getting your content out there then keep reading.

More than just Hacker News

First off don’t get me wrong, HN can provide a great surge of short term traffic. There’s a few problems I have with this though. To begin with it gives you one shot to get your message perfect, if you have the wrong message, miss your call to action, or forget an affiliate link then you get your 15k viewers well you get no second shot at it. Though more interestingly and a bit of gut feel, I’ve noticed that traffic contains more bounes and lower engagement. And then theres the issue that its definitely not a science to getting on there….

Other news sites

Of course theres other news sites, ones of relevance include reddit, dzone, monacle. I’ve found each of these can be similar in some ways to hacker news. Yet, they have a bit longer of a shelf life, giving me a slightly higher propensity to give them some attention. At the same time they also have a smaller reach.

Twitter

Twitter is yet another method that can work quite well. Observations of it include lower traffic than something like a hacker news, and equal to better engagement overall. Perhaps the most interesting piece I find is that you’ll often get more intelligent and also positive discussion on twitter vs. HN. There is one huge fail I find with twitter that nearly every company commits. The author of the post or company twitter handle posts it, then 10 people proceed to retweet it within 10 minutes. The problem with this if you’re anything like places I’ve worked at is you have a strong overlap of followers, this combined with the empheral nature of twitter means you’re diminishing reach. Instead having a few people (dont make it a strict requirement) retweet later in the day can help broaden your reach.

Google Plus

I’m slightly ashamed to be including this on here… I don’t use it, I don’t care for it, etc. I have no hard evidence of this either, but publishing on Google Plus seems to speed up Google’s indexing of the article to be much closer to instant. Take it for what you will.

Email

That thing you still get too much of is a great way of getting content out.

Long tail

Please please please do not underestimate the value in the long tail of your content. Google sends me between 200 and 500 new uniques a day due to various articles. Think about the areas you want to rank for and create your content for them, work on getting it out and syndicated, then

Where to start with developer content

Thu, 16 Jan 2014 12:55:56 -0800

Commonly at developer focused companies the question from a marketing team will come up of “How do we get content that developers find interesting”? Or how can I get our developers to blog more? Or some other similar question. I general the question of creating content and engaging with developers is a very common one, and often theres a mismatch between what marketing wants to do and what developers appreciate.

Stop marketing

Forget trying to “market” to developers. Hopefully you at least have developers that believe in the product their building, if thats not the case then find a new product or a new team. If you’ve got a product targetted at developers and a team that believes in it then you’re already half way there to marketing it. Now back to the first point, forget trying to market it. Start with building some form of an audience, reputation, respect among other developers. This isn’t done through ads, email marketing, SEO or any of that. Its done by creating content that developers find interesting, as a first step forget your product entirely, but don’t worry we’ll get there soon enough.

Sourcing content

The first piece of it on finding content should actually be extremely simple. Typically engineers love sharing knowledge and information. At least once a week there’s an email out to all the engineers of a truly interesting approach to something. This content is often not in a perfect form for external publication, but quite close. In particular Heroku has one employee, an early employee and now architect, that every email he sends to such a group I pull down and save for future reading. Another example of this was one of the Heroku founder Adam Wiggins, you can find many similar emails slightly cleaned up as blog posts on his own blog.

Take these emails, find someone technical enough to clean them up and ship them. Your goals here are to simply build some level of connection with other developers. Now a lot of time these may not be in the right “voice” for your company blog. Thats quite fine, I’m a strong proponent of letting developers create their own personalitiies. The place for the content then may not always be on the company blog. In general I find there’s three groupings:

Content for the company blog by an employee
Content for an individuals blog (the caveat here is they need to regularly create content - every 6 months doesnt cut it)
Content for an engineering blog (if you have enough of the above that blog infrequently this is a great home for it)

Don’t worry about the product yet

No really don’t worry about pitching your product. There was an awesome piece on the intercom blog talking about why most features fail and how companies pitch the details versus the problem they solve. Though there was a hidden gem in there:

{% blockquote [Des Traynor] [http://insideintercom.io/new-features-usually-flop/] [New Features Usually Flop] %} Telling your customers something is a “ground up rewrite”, “HTML5 based”, “responsive” or anything like that will miss the mark unless you’re selling to developers. {% endblockquote %}

For companies targetting developers this actually works really well, as a developer I care about the how. Simply put its interesting. Another great example of this is priceonomics. To be honest I only checked what they actually do in writing this post, but their posts I regularly find interesting.

Whats next

What do you want to know about? Creating a voice/brand, starting to pitch your product, content distribution? Let me know. A good about of my time is spent on these and happy to discuss further on whats valuable to others so we don’t have to suffer through painful marketing. Let me know craig.kerstiens@gmail.com or @craigkerstiens

Digging in with Foreign Tables

Thu, 19 Dec 2013 12:55:56 -0800

I wrote a couple months back about exploring FDWs. Its become quite clear to me, despite still having ample room for improvement, they’re not getting enough attention. Foreign data wrappers are perhaps better thought of as a foreign table, or even better yet as a view into some remote data source. They don’t take care of auto-updating or syncing data, thats all up to you, but it gives you a straight forward mapping to work with remote data easier.

The first step in working with FDWs is getting them setup. I wrote an earlier post on how to do this manually. And if you’re on Heroku theres an even easier solution if you want to setup a mapping entirely from one DB to another. The pg-extras CLI plugin has a command fdwsql which will generate the SQL to map all the tables for you. To run it simply specify the prefix app and database:

heroku pg:fdwsql yourprefix APP::DATABASE_URL

This will generate a lot of SQL. From here you’ll want to connect to the database where you want those foreign tables to be visible. Then run all the SQL. This will create all the foreign tables, this will mostly look just like another view or table to you in \d.

Tips on working with them

For the most part you can work with your foreign tables just like any other view or table. You can insert into them, read from them, join against them. Though currently foreign tables have some performance limitations, such as when joining it may return a lot more data than you expect then join. To make your performance a bit more ideal you can follow a few basic principles.

Lets look at some example tables to highlight this:

> \d
users
todos

In this case users is local and the todos are a foreign table. Looking at each of the schemas we have something like you might expect:

> \d users
Table "public.users"
Column | Type | Modifiers
-------------+-----------------------------+-----------
id | integer | not null
email | text |
created_at | timestamp without time zone |
Indexes:
"users_pkey" PRIMARY KEY, btree (id)
"users_created" btree (created_at)
> \d todo
Foreign Table "public.todo"
Column | Type | Modifiers
-------------+-----------------------------+-----------
id | integer | not null
user_id | integer |
desc | text |
created_at | timestamp without time zone |
status | boolean |
Indexes:
"users_pkey" PRIMARY KEY, btree (id)
"todo_created" btree (created_at)

The best Postgres feature you're not using – CTEs aka WITH clauses

Mon, 18 Nov 2013 12:55:56 -0800

SQL by default isn’t typically friendly to dive into, and especially so if you’re reading someone else’s already created queries. For some reason most people throw out principles we follow in other languages such as commenting and composability just for SQL. I was recently reminded of a key feature in Postgres that most don’t use by @timonk highlighting it in his AWS Re:Invent Redshift talk. The simple feature actually makes SQL both readable and composable, and even for my own queries capable of coming back to them months later and understanding them, where previously they would not be.

The feature itself is known as CTEs or common table expressions, you may also here it referred to as WITH clauses. The general idea is that it allows you to create something somewhat equivilant to a view that only exists during that transaction. You can create multiple of these which then allow for clear building blocks and make it simple to follow what you’re doing.

Lets take a look at a nice simple one:

WITH users_tasks AS (
SELECT
users.email,
array_agg(tasks.name) as task_list,
projects.title
FROM
users,
tasks,
project
WHERE
users.id = tasks.user_id
projects.title = tasks.project_id
GROUP BY
users.email,
projects.title
)

Using this I could now just append some basic other query on to the end that references this CTE users_tasks. Something akin to:

SELECT *
FROM users_tasks;

But where it becomes more interesting is chaining these together. So while I have all tasks assigned to each user here, perhaps I want to then find which users are responsible for more than 50% of the tasks on a given project, thus being the bottleneck. To oversimplify this we could do it a couple of ways, total up the tasks for each project, and then total up the tasks for each user per project:

total_tasks_per_project AS (
SELECT
project_id,
count(*) as task_count
FROM tasks
GROUP BY project_id
),
tasks_per_project_per_user AS (
SELECT
user_id,
project_id,
count(*) as task_count
FROM tasks
GROUP BY user_id, project_id
),

Then we would want to combine and find the users that are now over that 50%:

overloaded_users AS (
SELECT tasks_per_project_per_user.user_id,
FROM tasks_per_project_per_user,
total_tasks_per_project
WHERE tasks_per_project_per_user.task_count > (total_tasks_per_project / 2)
)

Now as a final goal I’d want to get a comma separated list of tasks of the overloaded users. So we’re simply giong to join against that overloaded_users and our initial list of users_tasks. Putting it all together it looks somewhat long, but becomes much more readable. And as a bonus I layered in some comments.

--- Created by Craig Kerstiens 11/18/2013
--- Query highlights users that have over 50% of tasks on a given project
--- Gives comma separated list of their tasks and the project
--- Initial query to grab project title and tasks per user
WITH users_tasks AS (
SELECT
users.id as user_id,
users.email,
array_agg(tasks.name) as task_list,
projects.title
FROM
users,
tasks,
project
WHERE
users.id = tasks.user_id
projects.title = tasks.project_id
GROUP BY
users.email,
projects.title
),
--- Calculates the total tasks per each project
total_tasks_per_project AS (
SELECT
project_id,
count(*) as task_count
FROM tasks
GROUP BY project_id
),
--- Calculates the projects per each user
tasks_per_project_per_user AS (
SELECT
user_id,
project_id,
count(*) as task_count
FROM tasks
GROUP BY user_id, project_id
),
--- Gets user ids that have over 50% of tasks assigned
overloaded_users AS (
SELECT tasks_per_project_per_user.user_id,
FROM tasks_per_project_per_user,
total_tasks_per_project
WHERE tasks_per_project_per_user.task_count > (total_tasks_per_project / 2)
)
SELECT
email,
task_list,
title
FROM
users_tasks,
overloaded_users
WHERE
users_tasks.user_id = overloaded_users.user_id

CTEs won’t always be quite as performant as optimizing your SQL to be as concise as possible. In most cases I have seen performance differences smaller than a 2X difference, this tradeoff for readability is a nobrainer as far as I’m concerned. And with time the Postgres optimizer should continue to get better about such performance.

As for the verbosity, yes I could have done this query in probably 10-15 lines of very concise SQL. Yet, most may not be able to understand it quickly if at all. Readability is huge when it comes to SQL to ensure its doing the right thing. SQL will almost always tell you an answer, it just may not be to the question you think you’re asking. Ensuring your queries can be reasoned about is critical to ensuring accuracy and CTEs are one great way of accomplishing that.

Tooling for Simple but Informative Emails

Sun, 13 Oct 2013 12:55:56 -0800

Emails are one of my favorite methods of communicating with users. Its works as a quick test for product validation. It works well at one->some->many-> all. Its still highly effective even as much noise as we receive in our inboxes. Over the years I’ve tried a lot of email tools from custom built solutions, to newer entrants that help around drip actions (intercom.io and customer.io), to more “enterprise” tools such as Marketo. While I have varying opinions on all of those, I still find myself coming back to a simple one off script setup to deliver clear concise emails.

Getting the Data

The first step of any email is deciding what you want to do, but hopefully you know that already. The part that is usually a bit more effort is actually getting the list to send it to and formatting it appropriately. I usually opt for SQL. While the specifics of the query of course always vary it common follows a general structure:

WITH initial_data AS (
SELECT
email,
app_name,
information_about_app
FROM
users,
apps
WHERE users.id = apps.user_id
AND some_filter_to_limit_data
),
candidates_for_email AS ... --- likely to have additional CTEs
--- Finally I build up the list
SELECT email,
array_to_string(array_agg(data_for_email), '
') --- an important note is to add a newline or not here depending on how you wish to format it
FROM candidates_for_email
GROUP BY email;

The query structure you’ll want is first column email, second column whatever data you want to include in your email.

From here I usually create a dataclip of it. This makes it easy to allow my data to change over time. If I’m testing an email for data over the last 7 days I just come back in 7 days and I have new data. It also lets me easily share and iterate on the data. The nice part is there’s an easy way to click a button and get the data as a CSV which is what you want for sending.

Once you download the CSV you’ll want to remove the header line as its not needed for the script.

Sending the Mail

To actually send the email you’ll need this script, which is largely credited to @leinweber:

require 'mail'
require 'csv'
FILE = ARGV[0]
Mail.defaults do
delivery_method :smtp, {
address: 'smtp address',
port: 587,
domain: 'gmail.com',
user_name: 'craig.kerstiens@gmail.com',
password: ENV.fetch('EMAIL_PASSWORD'),
authentication: :plain,
enable_starttls_auto: true
}
end
def send_email(address, app)
mail = Mail.new do
to address
from 'Craig Kerstiens <craig.kerstiens@gmail.com>'
subject "Your email subject in here"
body generate_body(app)
end
end
def generate_body(app)
%Q(
Hi,
Your list of apps:
#{app}
Various email content in here...
)
end
CSV.parse(File.read(FILE)).each do |line|
address = line[0]
app = line[1]
m = send_email(address, app)
puts m.to_s
p m.deliver!
puts
puts
end

You’ll want to make sure to export the PW of your email provider with EXPORT EMAIL_PASSWORD=pw_here

You can easily download this script from off of Github’s Gist. I’d recommend using an email service provider other than Gmail in sending your emails such as mailgun as they’re built to handle sending a large amount of emails. Finally send your emails:

ruby email.rb nameofyourfile.csv

Disabling muting while typing in Google hangouts

Thu, 12 Sep 2013 12:55:56 -0800

Google hangouts is awesome, its my preferred method for most audio/video calls these days. When running a group call I often dial into a separate phone if I have a better phone available for the group. It also got around the annoyance that when you are typing google automatically mutes you. This for most people is pretty subpar. While dialing in to the hangout can still be nice, you don’t have to do so to get rid of the annoying muting while typing. To fix such simply open up your terminal and run:

 defaults write com.google.googletalkplugind exps -string [\"-tm\"]

This clever hack discovered courtesy of @timtyrrell passed along to me by @mattmanning and @blakegentry

Diving into Postgres JSON operators and functions

Wed, 11 Sep 2013 12:55:56 -0800

Just as PostgreSQL 9.3 was coming out I had a need to take advantage of the JSON datatype and some of the operators and functions within it. The use case was pretty simple, run a query across a variety of databases, then take the results and store them. We explored doing something more elaborate with the columns/values, but in the end just opted to save the entire result set as JSON then I could use the operators to explore it as desired.

Here’s the general idea in code (using sequel):

result = r.connection { |c| c.fetch(self.query).all }
mymodel.results = result.to_json

As the entire dataset was stored as some compressed JSON I needed to do a bit of manipulation to get it back into a form that was workable. Fortunately all the steps were fairly straightforward.

First you want to unnest each result from the json array, in my case this looked like:

SELECT json_array_elements(result)

The above will unnest all of the array elements so I have an individual result as JSON. A real world example would look something similar to:

SELECT json_array_elements(result)
FROM query_results
LIMIT 2;
json_array_elements

{“column_name”:“data_in_here”} {“column_name_2”:“other_data_in_here”} (2 rows)

From here based on the query I would want to get some specific value. In this case I’m going to search for the text key column_name_2:

SELECT json_array_elements(result)->'column_name_2'
FROM query_results
LIMIT 1;
json_array_elements
-----------------------
"other_data_in_here"
(1 rows)

One gotcha I encountered was when I wanted to search for some value or exclude some value… Expecting I could just compare the result of the above in a where statement I was sadly mistaken because the equals operator didn’t translate. My first attempt at fixing this was to cast in this form:

SELECT json_array_elements(result)->'column_name_2'::text

The sad part is because of the operator the cast doesn’t get applied as I’d expect. Instead you’ll want to do:

SELECT (json_array_elements(result)->'column_name_2')::text

Of course theres plenty more you can do with the JSON operators in the new Postgres 9.3. If you’ve already got JSON in your application give them a look today. And while slightly worse, if you’ve got JSON stored in a text field simply cast it with ::json to begin using the operators.

The Rule of Thirds - followup

Tue, 13 Aug 2013 12:55:56 -0800

Several months back I wrote about how we do higher level, long term planning within the Heroku Postgres team. If you haven’t read the previous article please start there.

The exercise or rule of thirds is intended to be approximate prioritization and not a perfect science. Since that time I’m familiar with some teams both in and out of Heroku who have attempted this exercise with varying levels of success. We’ve now done this process 4 times within the team and after the most recent exercise attempted to take some time to internalize why its worked well, creating some more specifics about the process. Heres an attempt to provide even more clarity:

Gather data ahead of time

Its really common to have a list things to work on, but knowing the impact of those is commonly pure speculation. There may be some people that talk to customers, but even then its a subset of your actual customer base. Going into the exercise as much data you can have ahead of time on impact of features and specific problems helps. In our case we do this by:

Surveying current customers and users
Surveying attriters
Engaging with customer facing teams to hear trends
Input from external parties such as analysts on trends

Allow for casual discussion

We typically conduct our planning exercise at an offsite, this is a multi-day time of team bonding, planning, hacking. We intentionally schedule our planning excercise towards the end of the offsite. This allows us to have updates/presentations frmo the data we’ve gathered and from those that are customer facing. Presentations are meant to be short and direct, discussion can flow casually after. This gets a lot of people on the same page at a smaller level and reduces the problem of too many cooks in the kitchen come time for the actual exercise.

The rule of thirds

Creating the list

Coming to the exercise itself… We begin by everyone writing a list of their ideas individually, this is meant to be a list of the features we want to place on the grid. At this point theres no prioritizing of difficulty or impact. In addition each list while individually created does not have to contain items that only pertain to you, its more a comprehensive list of all the things you can think of that may be important to do.

Bucketing part 1

Once individual lists are created you can then collectively or designate one or two people to clean it up. We do this in two forms:

Removing duplicate items, which there should be several of.
Bucketing my a common/theme idea, this simply makes things more digestable

If you’re a big group of greater than 7 then it may be advisable to designate two people to do this exercise together. If a smaller group it can be manageable to coordinate collectively.

Bucketing part 2

Once you’ve removed dupes, identified themes, and removed excess items (depending on your team size you’ll find how many feels right - we aim an average of 5-6 per square for a team of 10) its then on to actually putting them on the grid. In the past we’ve done this a variety of ways but our most recent process seemed to be quiet efficient. We gave each item 60 seconds, at the end of that minute wherever the item was it was left there. This forced some quick discussion on impact and difficulty but in the end left us at a very good hit rate without taking multiple hours to complete the exercise.

Final pass

We intentionally design it so that low effort and high impact is on the top right corner. Finally once everything is on there we allocate names to the tasks, and put boxes around items we’re planning to do in the coming months. With boxes make it very clear of what we are doing as well as explicitly things we are not. The initials or names make it clear of how loaded down people are. If your name is on 3 tasks that are high difficulty, then you’re likely over allocated.

At this point things usually fall out pretty quickly and we emerge with some rough roadmap that in retrospect we’ve followed pretty accuately.

The missing PostgreSQL documentation

Wed, 07 Aug 2013 12:55:56 -0800

For a couple of years I’ve complained about the Postgres documentation and at the same time paraded it as one of the best sets of documentation I’ve encountered. In many ways the reason I veer towards Postgres as well as Python and Django is the quality of their documentation. If you need to find details about something its documented, and more importantly well and thoroughly documented.

In large part I came to Python by happenstance through Django, and Postgres through happenstance of an employer. Yet, Django was very little of an accident. The Django Tutorial got me a large part of what I needed to know and more excited about development than I had been in some time. Python has done some work at adding docs to make this even better, sadly its still very much needed for PostgreSQL.

Whats Missing in the Postgres Docs

Theres a huge variety of types of documentation, off the top of my head theres:

Reference docs (Postgres excels at this)
Onboarding (Postgres tutorial huh?)
Tailored guides (Postgres? I can haz? Nope… We don’t understand….)

Postgres is great if you know the name of what you’re looking for, but if you don’t you’re entirely left in the dark.

Understanding the power of Postgres

Postgres is good enough at performance, good enough at usability, and awesome at how powerful and flexible it can be. But all of this is entirely lost if you have to know the esoteric name of what you’re looking for.

What the hell is an hstore… In so many ways KVstore makes infintely more sense. In the same sense PLV8, I have to know not only what PL stands for but V8 as well, versus the JavaScript extension for Postgres.

I understand there are plenty of reasons why some of these things are the way they are, but its also limiting how great the broader perception is. Postgres externally is this hard to use DB, that well is just a database, versus giving developers a set of powerful and useful functions to make their lives better.

The Solution

Lets fix things, there are a ton of people that would love to know more about all things Postgres. This ranges from a good set of onboarding docs, to specific blog posts on topics that people are curious about. Just last week I got an email about improving the Postgres tutorial… Yes theres a tutorial hidden in the 2000 page set of documentation for Postgres. Its simply old, mostly uninteresting, and well just needs to be completely recreated. A great alternative would be a few tutorials/guides for:

Noobs to databases in general (Total 101 guide)
Building and architecting your application with Postgres (App Devs)
Administering and maintaining Postgres (DBAs)
SQL and reporting in Postgres (consumers of data, analysts, product people, marketing, etc.)

If jumping in and contributing to fixing the core tutorial isn’t your cup of tea because you don’t want to learn and write in SGML, send a pull request to postgresguide.com or do a [guest post on my blog](mailto:craig.kerstiens@gmail.com]. If thats too much effort please just let us know, what do you want to see - craig.kerstiens at gmail.com

A look at Foreign Data Wrappers

Mon, 05 Aug 2013 12:55:56 -0800

There are two particular sets of features that continue to keep me very excited about the momentum of Postgres. And while PostgreSQL has had some great momentum in the past few years these features may give it an entirely new pace all together. One is extensions, which is really its own category. Dimitri Fontaine was talking about doing a full series just on extensions, so here’s hoping he does so I dont have to :)

One subset of extensions which I consider entirely separate is the other thing, which is foreign data wrappers or FDWs. FDWs allow you to connect to other data sources from within Postgres. From there you can query them with SQL, join across disparate data sets, or join across different systems. Recently I had a good excuse to give the postgres_fdw a try. And while I’ve blogged about the Redis FDW previously, the Postgres one is particularly exciting because with PostgreSQL 9.3 it will ship as a contrib module, which means all Postgres installers should have it… you just have to turn it on.

Let’s take a look at getting it setup and then dig into it a bit. First, because I don’t have Postgres 9.3 sitting around on my system I’m going to provision one from Heroku Postgres:

$ heroku addons:add heroku-postgresql:crane --version 9.3

Once it becomes available I’m going to connect to it then enable the extension:

$ heroku pg:psql BLACK -acraig
# CREATE EXTENSION postgres_fdw;

Now its there, so we can actually start using it. To use the FDW there’s four basic things you’ll want to do:

Create the remote server
Create a user mapping for the remote server
Create your foreign tables
Start querying some things

The setup

You’ll only need to do each of the following once, once you’re server, user and foreign table are all setup you can simply query away. This is a nice advantage over db_link which only exists for the set session. One downside I did find was that you can’t use a full Postgres connection string, which would make setting it up much simpler. So onto setting up our server:

# CREATE SERVER app_db
FOREIGN DATA WRAPPER postgres_fdw
OPTIONS (dbname 'dbnamehere', host 'hostname-here);

Next we’ll actually create our user mapping. In this case we’ll take the remote username and password and map it to our current user we’re already connected with.

# CREATE USER MAPPING for user_current
SERVER app_db
OPTIONS (user 'remote_user', password 'remote_password');

And finally we’re going to configure our tables. There were some additional pains here as there wasn’t a perfectly clean way to generate the CREATE TABLE. Sure you could pg_dump just that table, but overall it felt a bit cludgey.

# CREATE FOREIGN TABLE users
(
id integer,
email text,
created_at timestamp,
first_name text,
last_name text
)
SERVER app_db OPTIONS (table_name 'users')

Now we’ve got all of our local data, as well as remote data. For that report against two databases where you previously wrote a ruby or python script, ran a query, constructed another query, then executed it you can directly do in your database. We can simply query our new table - SELECT * FROM users LIMIT 5;

But the real power of foreign data wrappers goes well beyond just Postgres to Postgres. Having a defined contract in translating from one system to another, will really allow reinventing the way we work with data. This is especially true in large datasets where doing ETL on terrabytes of data takes longer than asking the questions of it.

While we’re waiting for more FDWs to be ready to use in production situations the Postgres FDW is a great start, though the Redis one is on its way. Even better is that it ships with standard installs of Postgres, meaning it will see more usage and help push them to advance further.

One final nicety, you’re not required to have ALL Postgres 9.3 DBs, just one that can then connect to the others, so go ahead and give it try :)

Postgres Dollar Quoting

Fri, 02 Aug 2013 12:55:56 -0800

After my most recent post on documenting your database I had a colleague and friend chime in:

{% blockquote @danfarina https://twitter.com/danfarina/status/362007008079126528 %} @craigkerstiens You may want to mention for another post the generality of dollar quoting: it’s not just for CREATE FUNCTION. {% endblockquote %}

Luckily I was able to convince him to create the post. You can read a bit more on him below, but without further adieu here’s a bit on dollar quoting within Postgres:

Postgres supports two forms of entry of data literals into the system. One is the familiar single-quote:

=> SELECT 'hello';
?column?
----------
hello
(1 row)

This format is problematic when one might be using single quotes in the textual string.

Postgres also supports another way to enter data literals, most often seen in CREATE FUNCTION, but can be profitably used anywhere. This is called “dollar quoting,” and it looks like this:

=> SELECT $$hello's the name of the game$$;
?column?
------------------------------
hello's the name of the game
(1 row)

If one needs nested dollar quoting, one can specify a string, much like the ‘heredoc’ feature seen in some programming languages:

=> SELECT $goodbye$hello's the name of the $$ game$goodbye$;
?column?
---------------------------------
hello's the name of the $$ game
(1 row)

This can appear anywhere where single quotes would otherwise be, simplifying tasks like using contractions in database object comments, for example:

=> CREATE TABLE described(a int);
=> COMMENT ON TABLE described IS $$I'm describing this,
including newlines and an apostrophe in the contraction "I'm."$$;

Or, alternatively, entry of literals for types that may include apostrophes in their serialization, such as ’text’ or ‘json’:

=> CREATE TABLE json(data json);
=> INSERT INTO json(data) VALUES
($${"quotation": "'there is no time like the present'"}$$);

Security

Even though dollar quotes can be used to reduce the pain of many quoting problems, don’t be tempted to use them to avoid SQL injection: an adversary that knows one is using dollar quoting can still mount exactly the same kind of attacks as if one were using single quotes.

There is also no need, because any place a data literal can appear can also be used with parameter binding (e.g. $1, $2, $3…), which one’s Postgres driver should support. Nevertheless, for data or scripts one is working with by hand, dollar quoting can make things much easier to read.

About the Author

Daniel Farina is a long time colleague and friend, having worked together at 5 different companies. He’s part of the Heroku Postgres team as the resident tuple groomer, and the creator of WAL-E.

As is always the case if you have articles you’d like to see created or if you’re interested in doing a guest post please feel free to drop me a line craig.kerstiens at gmail.com. And if you have articles you feel are helpful to others in the Postgres world drop me a note as well for including them in Postgres Weekly.

Documenting your PostgreSQL database

Mon, 29 Jul 2013 12:55:56 -0800

Just a few days ago I was surprised by what someone was doing with their database, and not in the typical horrifying travesty against mankind. Rather, it was a feature that while familiar with I’d never seen anyone fully take proper advantage of - COMMENT or describing tables. Postgres has a nice facility for you to provide a description for just about anything:

Table
Column
Function
Schema
View
Index
Etc.

The specific use case was a database acting as a datamart pulling in data from multiple sources to be able to report against disparate data. Over the years I’ve seen this occur really one three ways, the first is that a limited set of people, typically one person, have knowledge over all the datasources and thus far the sole individual responsible for creating reports and answering questions of the data. The second, is wide open access to anyone that wishes for it. In this case you often have people asking questions of the data, and because they don’t understand the relationships coming up to entirely wrong conclusions. The final approach is to create some external documentation, entity relationship diagrams, data dictionaries, etc. This last one often works okay enough, but often suffers from lack of updates and being too heavyweight.

A better solution, and all around good process is simply documenting clearly within the database itself. Simply comment each table and column, just as you would outside of your DB then it can be quite clear when inside the database working interactivly:

COMMENT ON TABLE products IS 'Products catalog';
COMMENT ON COLUMN products.price is 'Current price of a single item purchased';

While an obvious example above naming even the most mundance columns can help create more accurate reports. Then of course when you want to inspect your DB its quite clear:

\d+ users
# \d+ users
Table "public.users"
Column | Type | ... | Description
------------+-----------------------------+-...-+-----------------------------------------
id | integer | ... | auto serial pk
first_name | character varying(50) | ... | required first name of user
last_name | character varying(50) | ... | required first name of user
email | character varying(255) | ... | email address of account
data | hstore | ... | mix of data, city, state, gender
created_at | timestamp without time zone | ... | when account was created, not confirmed
updated_at | timestamp without time zone | ... | time any details were last updated
Indexes:
"idx_user_created" btree (date_trunc('day'::text, created_at))
Has OIDs: no

But it doesn’t necessarily have to stop there. Which actually brings me to one other item, you should be commenting your SQL just the same. SQL comments can be done easily by just starting a line with --, or you can have it at the end of the line with further info. Here’s a nice example:

-- Query aggregates all project names that have open past due tasks grouped by email
SELECT
users.email,
array_to_string(array_agg(projects.name), ',')) as projects # Aggregate all projects and separate by comma
FROM
projects,
tasks,
users
-- A user has a project, which has tasks
WHERE projects.id = tasks.project_id
-- Check for tasks that are due before now and not done yet
AND tasks.due_at > tasks.completed_at
AND tasks.due_at < now()
AND users.id = projects.user_id
GROUP BY
users.email

You comment your code, why shouldn’t you comment your database?

hstore vs. JSON - Which to use in Postgres

Wed, 03 Jul 2013 12:55:56 -0800

If you’re deciding what to put in Postgres and what not to, consider that Postgres can be a perfectly good schema-less database. Of course as soon as people realized this then the common comes a question, is hstore or JSON better. Which do I use and in what cases. Well first, if you’re not familiar check out some previous material on them:

If you’re already up to date with both of them, but still wondering which to use lets dig in.

hstore

hstore is a key value store directly within your database. Its been a common favorite of mine and has been for some time. hstore gives you flexibility when working with your schema, as you don’t have to define models ahead of time. Though its two big limitations are that 1. it only deals with text and 2. its not a full document store meaning you can’t nest objects.

Though major benefits of hstore include the ability to index on it, robust support for various operators, and of course the obvious of flexibility with your data. Some of the basic operators available include:

Return the value from columnfoo for key bar:

foo->'bar'

Does the specified column foo contain a key bar:

foo?'bar'

Does the specified column foo contain a value of baz for key bar:

foo@>'bar->baz'

Perhaps one of the best parts of hstore is that you can index on it. In particular Postgres gin and gist indexes allow you to index all keys and values within an hstore. A talk by Christophe Pettus of PgExperts actually highlights some performance details of hstore with indexes. To give away the big punchline in several cases hstore with gin/gist beats mongodb in performance.

json

JSON in contrast to hstore is a full document datatype. In addition to nesting objects you have support for more than just text (read numbers). As you insert JSON into Postgres it will automatically ensure its valid JSON and error if its well not. JSON gets a lot better come Postgres 9.3 as well with some built in operators. Though if you need more functionality in it today you should look at PLV8.

Which to Use

So which do you actually want to use in your application? If you’re already using JSON and simply want to store it in your database then the JSON datatype is often the correct pick. However, if you’re just looking for flexibility with your data model then hstore is likely the path you want to take. hstore will give you much of the flexibility you want as well as a good ability to query your data in a performant manner. Of course much of this starts to change in Postgres 9.3.

Pivoting in Postgres

Thu, 27 Jun 2013 12:55:56 -0800

Earlier today on an internal Heroku group alias there was a dataclip shared. The dataclip listed off some data grouped by a category, there was a reply a few minutes later with a modification to the query that used the crosstab function to pivot directly in SQL. There were immediately several reactions on the list that went something like this:

While a mostly simple function in Postgres (there are a few rough edges), it really is all too handy. So here it is in action. Taking some data that looks like

row identifier, in this case date
category grouping, in this case OS
value

Given a really basic query that generates some sample data it may look something like this:

SELECT generate_series AS date,
b.desc AS TYPE,
(random() * 10000 + 1)::int AS val
FROM generate_series((now() - '100 days'::interval)::date, now()::date, '1 day'::interval),
(SELECT unnest(ARRAY['OSX', 'Windows', 'Linux']) AS DESC) b;

You get results that look like:

But of course this isn’t overly helpful in comparing day to day overall. You can do so on a OS by OS basis, but its annoying enough as is. The easy solution is to simply use a pivot table on your data. Most people at this point would pull it up into Excel or Google Docs, or you can do it directly in Postgres. To do so you’ll first enable the extension tablefunc:

CREATE EXTENSION tablefunc

Then you’ll use the crosstab function. The function looks something like:

SELECT *
FROM crosstab(
'SELECT row_name, category_grouping, value FROM foo',
'SELECT category_names FROM bar')
AS
ct_result (category_name text, category1 text, category2 text, etc.)

Lets see it an actual action. Given the same query we used to generate fake data we can actually pivot on it now directly in PostgreSQL:

SELECT *
FROM crosstab(
'SELECT
a date,
b.desc AS os,
(random() * 10000 + 1)::int AS value
FROM generate_series((now() - ''100 days''::interval)::date, now()::date, ''1 DAY''::interval) a,
(SELECT unnest(ARRAY[''OSX'', ''Windows'', ''Linux'']) AS DESC) b ORDER BY 1,2
','SELECT unnest(ARRAY[''OSX'', ''Windows'', ''Linux''])'
)
AS ct(date date, OSX int, Windows int, Linux int);

And see some results:

Have fun analyzing your data directly in your DB now. And as always if you have feedback/questions/requests please feel free to drop me a line craig.kerstiens@gmail.com

Javascript Functions for PostgreSQL

Tue, 25 Jun 2013 12:55:56 -0800

Javascript in Postgres has gotten a good bit of love lately, part of that is from Heroku Postgres recently adding support for Javascript and part from a variety of people championing the power of it such as @leinweber (Embracing the web with JSON and PLV8) and @selenamarie (schema liberation with JSON and PLV8). In a recent conversation it was pointed out that it seems a bit of headache to have to create your own functions, or at least having an initial collection would make it that much more powerful. While many can look forward to PostgreSQL 9.3 which will have a bit more built in support for JSON a few functions can really help make it more useful today.

These are courtesy of Will Leinweber. For each of the following functions I’ll highlight an example of using it as well. To get an idea of the data its being run on:

select * from example;
data
--------------------------------------------
{"name":"Craig Kerstiens", +
"age":27, +
"siblings":1, +
"numbers":[ +
{"type":"work", +
"number":"123-456-7890"}, +
{"type":"home", +
"number":"456-123-7890"}]}
(1 row)

get_text

CREATE OR REPLACE FUNCTION
get_text(key text, data json)
RETURNS text AS $$
return data[key];
$$ LANGUAGE plv8 IMMUTABLE STRICT;

Then using the function:

select get_text('name', data) from example;
get_text
----------------
Craig Kerstiens
(1 row)

get_numeric

CREATE OR REPLACE FUNCTION
get_numeric(key text, data json)
RETURNS numeric AS $$
return data[key];
$$ LANGUAGE plv8 IMMUTABLE STRICT;

Then using the function:

select get_numeric('siblings', data) from example;
get_text
----------------
1
(1 row)

json_select

create or replace function
json_select(selector text, data json)
returns json as $$
exports = {};
(function(a){function z(a){return{sel:q(a)[1],match:function(a){return y(this.sel,a)},forEach:function(a,b){return x(this.sel,a,b)}}}function y(a,b){var c=[];x(a,b,function(a){c.push(a)});return c}function x(a,b,c,d,e,f){var g=a[0]===","?a.slice(1):[a],h=[],i=!1,j=0,k=0,l,m;for(j=0;j<g.length;j++){m=w(b,g[j],d,e,f),m[0]&&(i=!0);for(k=0;k<m[1].length;k++)h.push(m[1][k])}if(h.length&&typeof b=="object"){h.length>=1&&h.unshift(",");if(u(b))for(j=0;j<b.length;j++)x(h,b[j],c,undefined,j,b.length);else for(l in b)b.hasOwnProperty(l)&&x(h,b[l],c,l)}i&&c&&c(b)}function w(a,b,c,d,e){var f=[],g=b[0]===">"?b[1]:b[0],h=!0,i;g.type&&(h=h&&g.type===v(a)),g.id&&(h=h&&g.id===c),h&&g.pf&&(g.pf===":nth-last-child"?d=e-d:d++,g.a===0?h=g.b===d:(i=(d-g.b)%g.a,h=!i&&d*g.a+g.b>=0));if(h&&g.has){var j=function(){throw 42};for(var k=0;k<g.has.length;k++){try{x(g.has[k],a,j)}catch(l){if(l===42)continue}h=!1;break}}h&&g.expr&&(h=p(g.expr,a)),b[0]!==">"&&b[0].pc!==":root"&&f.push(b),h&&(b[0]===">"?b.length>2&&(h=!1,f.push(b.slice(2))):b.length>1&&(h=!1,f.push(b.slice(1))));return[h,f]}function v(a){if(a===null)return"null";var b=typeof a;b==="object"&&u(a)&&(b="array");return b}function u(a){return Array.isArray?Array.isArray(a):b.call(a)==="[object Array]"}function t(a,b,c){var d=b,g={},j=i(a,b);j&&j[1]===" "&&(d=b=j[0],j=i(a,b)),j&&j[1]===f.typ?(g.type=j[2],j=i(a,b=j[0])):j&&j[1]==="*"&&(j=i(a,b=j[0]));for(;;){if(j===undefined)break;if(j[1]===f.ide)g.id&&e("nmi",j[1]),g.id=j[2];else if(j[1]===f.psc)(g.pc||g.pf)&&e("mpc",j[1]),j[2]===":first-child"?(g.pf=":nth-child",g.a=0,g.b=1):j[2]===":last-child"?(g.pf=":nth-last-child",g.a=0,g.b=1):g.pc=j[2];else{if(j[1]!==f.psf)break;if(j[2]===":val"||j[2]===":contains")g.expr=[undefined,j[2]===":val"?"=":"*=",undefined],j=i(a,b=j[0]),j&&j[1]===" "&&(j=i(a,b=j[0])),(!j||j[1]!=="(")&&e("pex",a),j=i(a,b=j[0]),j&&j[1]===" "&&(j=i(a,b=j[0])),(!j||j[1]!==f.str)&&e("sex",a),g.expr[2]=j[2],j=i(a,b=j[0]),j&&j[1]===" "&&(j=i(a,b=j[0])),(!j||j[1]!==")")&&e("epex",a);else if(j[2]===":has"){j=i(a,b=j[0]),j&&j[1]===" "&&(j=i(a,b=j[0])),(!j||j[1]!=="(")&&e("pex",a);var k=q(a,j[0],!0);j[0]=k[0],g.has||(g.has=[]),g.has.push(k[1])}else if(j[2]===":expr"){g.expr&&e("mexp",a);var l=o(a,j[0]);j[0]=l[0],g.expr=l[1]}else{(g.pc||g.pf)&&e("mpc",a),g.pf=j[2];var m=h.exec(a.substr(j[0]));m||e("mepf",a),m[5]?(g.a=2,g.b=m[5]==="odd"?1:0):m[6]?(g.a=0,g.b=parseInt(m[6],10)):(g.a=parseInt((m[1]?m[1]:"+")+(m[2]?m[2]:"1"),10),g.b=m[3]?parseInt(m[3]+m[4],10):0),j[0]+=m[0].length}}j=i(a,b=j[0])}d===b&&e("se",a);return[b,g]}function s(a){if(a[0]===","){var b=[","];for(var c=c;c<a.length;c++){var d=r(d[c]);b=b.concat(d[0]===","?d.slice(1):d)}return b}return r(a)}function r(a){var b=[],c;for(var d=0;d<a.length;d++)if(a[d]==="~"){if(d<2||a[d-2]!=">")c=a.slice(0,d-1),c=c.concat([{has:[[{pc:":root"},">",a[d-1]]]},">"]),c=c.concat(a.slice(d+1)),b.push(c);if(d>1){var e=a[d-2]===">"?d-3:d-2;c=a.slice(0,e);var f={};for(var g in a[e])a[e].hasOwnProperty(g)&&(f[g]=a[e][g]);f.has||(f.has=[]),f.has.push([{pc:":root"},">",a[d-1]]),c=c.concat(f,">",a.slice(d+1)),b.push(c)}break}if(d==a.length)return a;return b.length>1?[","].concat(b):b[0]}function q(a,b,c,d){c||(d={});var f=[],g,h;b||(b=0);for(;;){var j=t(a,b,d);f.push(j[1]),j=i(a,b=j[0]),j&&j[1]===" "&&(j=i(a,b=j[0]));if(!j)break;if(j[1]===">"||j[1]==="~")j[1]==="~"&&(d.usesSiblingOp=!0),f.push(j[1]),b=j[0];else if(j[1]===",")g===undefined?g=[",",f]:g.push(f),f=[],b=j[0];else if(j[1]===")"){c||e("ucp",j[1]),h=1,b=j[0];break}}c&&!h&&e("mcp",a),g&&g.push(f);var k;!c&&d.usesSiblingOp?k=s(g?g:f):k=g?g:f;return[b,k]}function p(a,b){if(a===undefined)return b;if(a===null||typeof a!="object")return a;var c=p(a[0],b),d=p(a[2],b);return l[a[1]][1](c,d)}function o(a,b){function c(a){return typeof a!="object"||a===null?a:a[0]==="("?c(a[1]):[c(a[0]),a[1],c(a[2])]}var d=n(a,b?b:0);return[d[0],c(d[1])]}function n(a,b){b||(b=0);var c=m(a,b),d;if(c&&c[1]==="("){d=n(a,c[0]);var f=m(a,d[0]);(!f||f[1]!==")")&&e("epex",a),b=f[0],d=["(",d[1]]}else!c||c[1]&&c[1]!="x"?e("ee",a+" - "+(c[1]&&c[1])):(d=c[1]==="x"?undefined:c[2],b=c[0]);var g=m(a,b);if(!g||g[1]==")")return[b,d];(g[1]=="x"||!g[1])&&e("bop",a+" - "+(g[1]&&g[1]));var h=n(a,g[0]);b=h[0],h=h[1];var i;if(typeof h!="object"||h[0]==="("||l[g[1]][0]<l[h[1]][0])i=[d,g[1],h];else{i=h;while(typeof h[0]=="object"&&h[0][0]!="("&&l[g[1]][0]>=l[h[0][1]][0])h=h[0];h[0]=[d,g[1],h[0]]}return[b,i]}function m(a,b){var d,e=j.exec(a.substr(b));if(e){b+=e[0].length,d=e[1]||e[2]||e[3]||e[5]||e[6];if(e[1]||e[2]||e[3])return[b,0,c(d)];if(e[4])return[b,0,undefined];return[b,d]}}function k(a,b){return typeof a===b}function i(a,b){b||(b=0);var d=g.exec(a.substr(b));if(!d)return undefined;b+=d[0].length;var h;d[1]?h=[b," "]:d[2]?h=[b,d[0]]:d[3]?h=[b,f.typ,d[0]]:d[4]?h=[b,f.psc,d[0]]:d[5]?h=[b,f.psf,d[0]]:d[6]?e("upc",a):d[8]?h=[b,d[7]?f.ide:f.str,c(d[8])]:d[9]?e("ujs",a):d[10]&&(h=[b,f.ide,d[10].replace(/\\([^\r\n\f0-9a-fA-F])/g,"$1")]);return h}function e(a,b){throw new Error(d[a]+(b&&" in '"+b+"'"))}function c(a){try{if(JSON&&JSON.parse)return JSON.parse(a);return(new Function("return "+a))()}catch(b){e("ijs",b.message)}}var b=Object.prototype.toString,d={bop:"binary operator expected",ee:"expression expected",epex:"closing paren expected ')'",ijs:"invalid json string",mcp:"missing closing paren",mepf:"malformed expression in pseudo-function",mexp:"multiple expressions not allowed",mpc:"multiple pseudo classes (:xxx) not allowed",nmi:"multiple ids not allowed",pex:"opening paren expected '('",se:"selector expected",sex:"string expected",sra:"string required after '.'",uc:"unrecognized char",ucp:"unexpected closing paren",ujs:"unclosed json string",upc:"unrecognized pseudo class"},f={psc:1,psf:2,typ:3,str:4,ide:5},g=new RegExp('^(?:([\\r\\n\\t\\ ]+)|([~*,>\\)\\(])|(string|boolean|null|array|object|number)|(:(?:root|first-child|last-child|only-child))|(:(?:nth-child|nth-last-child|has|expr|val|contains))|(:\\w+)|(?:(\\.)?(\\"(?:[^\\\\\\"]|\\\\[^\\"])*\\"))|(\\")|\\.((?:[_a-zA-Z]|[^\\0-\\0177]|\\\\[^\\r\\n\\f0-9a-fA-F])(?:[_a-zA-Z0-9\\-]|[^\\u0000-\\u0177]|(?:\\\\[^\\r\\n\\f0-9a-fA-F]))*))'),h=/^\s*\(\s*(?:([+\-]?)([0-9]*)n\s*(?:([+\-])\s*([0-9]))?|(odd|even)|([+\-]?[0-9]+))\s*\)/,j=new RegExp('^\\s*(?:(true|false|null)|(-?\\d+(?:\\.\\d*)?(?:[eE][+\\-]?\\d+)?)|("(?:[^\\]|\\[^"])*")|(x)|(&&|\\|\\||[\\$\\^<>!\\*]=|[=+\\-*/%<>])|([\\(\\)]))'),l={"*":[9,function(a,b){return a*b}],"/":[9,function(a,b){return a/b}],"%":[9,function(a,b){return a%b}],"+":[7,function(a,b){return a+b}],"-":[7,function(a,b){return a-b}],"<=":[5,function(a,b){return k(a,"number")&&k(b,"number")&&a<=b}],">=":[5,function(a,b){return k(a,"number")&&k(b,"number")&&a>=b}],"$=":[5,function(a,b){return k(a,"string")&&k(b,"string")&&a.lastIndexOf(b)===a.length-b.length}],"^=":[5,function(a,b){return k(a,"string")&&k(b,"string")&&a.indexOf(b)===0}],"*=":[5,function(a,b){return k(a,"string")&&k(b,"string")&&a.indexOf(b)!==-1}],">":[5,function(a,b){return k(a,"number")&&k(b,"number")&&a>b}],"<":[5,function(a,b){return k(a,"number")&&k(b,"number")&&a<b}],"=":[3,function(a,b){return a===b}],"!=":[3,function(a,b){return a!==b}],"&&":[2,function(a,b){return a&&b}],"||":[1,function(a,b){return a||b}]};a._lex=i,a._parse=q,a.match=function(a,b){return z(a).match(b)},a.forEach=function(a,b,c){return z(a).forEach(b,c)},a.compile=z})(typeof exports=="undefined"?window.JSONSelect={}:exports)
return JSON.stringify(
exports.match(selector,
data));
$$ LANGUAGE plv8 IMMUTABLE STRICT

Then using the function:

select json_select('.name nth-child(1)', data) as name, json_select('.numbers', data) as phone
from example;
name | phone
--------------------+------------------------------------------------------------------------------------------
["Craig Kerstiens"] | [[{"type":"work","number":"456-123-7890"},{"type":"home","number":"123-456-7890"}]]
(1 row)

javascript injection attack

create or replace function
js(src text) returns text as $$
return eval(
"(function() { " + src + "})"
)();
$$ LANGUAGE plv8;

Have any others you feel are essential when starting to work with JSON? Let me know craig.kerstiens@gmail.com. Beyond that give JSON and JavaScript a try inside your database.

Explaining your PostgreSQL data

Thu, 13 Jun 2013 12:55:56 -0800

I’ve written a bit before about understanding the output from EXPLAIN and EXPLAIN ANALYZE in PostgreSQL. Though understandably getting a grasp on execution plans could probably use some more guidance. Yet, this time around I’m taking a bit of a cop out and highlighting a few tools instead of documenting myself, which I’ve done in a talk I’ve frequently given Postgres Demystified.

Getting at the Data

The first small thing you can do is actually retrieve the data in JSON form. By adding in (format json) right after your EXPLAIN or EXPLAIN ANALYZE command it’ll as you’d expect return it in JSON. To give an example:

# EXPLAIN SELECT * FROM users LIMIT 1;
QUERY PLAN
--------------------------------------------------------------
Limit (cost=0.00..0.03 rows=1 width=812)
-> Seq Scan on users (cost=0.00..1.50 rows=50 width=812)
(2 rows)

Then in JSON format:

EXPLAIN (format json) SELECT * FROM users LIMIT 1;
QUERY PLAN
------------------------------------------------
[ +
{ +
"Plan": { +
"Node Type": "Limit", +
"Startup Cost": 0.00, +
"Total Cost": 0.03, +
"Plan Rows": 1, +
"Plan Width": 812, +
"Plans": [ +
{ +
"Node Type": "Seq Scan", +
"Parent Relationship": "Outer",+
"Relation Name": "users", +
"Alias": "users", +
"Startup Cost": 0.00, +
"Total Cost": 1.50, +
"Plan Rows": 50, +
"Plan Width": 812 +
} +
] +
} +
} +
]
(1 row)

While its on my list to build some interesting apps by pulling in the JSON input, others may be equally as interested in taking advantage of this data in its JSON form. If you take a shot at building something with this output, as always I’d love to hear about it - craig.kerstiens@gmail.com

Despez

Of course if you’re itch isn’t in better tools for Postgres, you may just want to have a solution that works today. While its not perfect, one of the best ones out there is Dezpez’s explain tool. You can take any execution plan and paste it in and get some better visual representation of the result. You can also share them as well.

Postgres Indexing - A collection of indexing tips

Thu, 30 May 2013 12:55:56 -0800

Even from intial reviews of my previous post on expression based indexes I received a lot of questions and feedback around many different parts of indexing in Postgres. Here’s a mixed collection of valuable tips and guides around much of that.

Unused Indexes

In an earlier tweet I joked about some SQL that would generate the SQL to add an index to every column:

# SELECT 'CREATE INDEX idx_'
|| table_name || '_'
|| column_name || ' ON '
|| table_name || ' ("'
|| column_name || '");'
FROM information_schema.columns;
?column?
---------------------------------------------------------------------
CREATE INDEX idx_pg_proc_proname ON pg_proc ("proname");
CREATE INDEX idx_pg_proc_pronamespace ON pg_proc ("pronamespace");
CREATE INDEX idx_pg_proc_proowner ON pg_proc ("proowner");

The reasoning behind this is guessing whether an index will be helpful can be a bit hard within Postgres. So the easy solution is to add indexes to everything, then just observe if they’re being used. Of course you want to add it to all tables/columns because you never know if core of Postgres may be missing some needed ones

As included with the pg-extras plugin for Heroku you can run a query to show you all unused indexes. On Heroku simply install the plugin the run heroku pg:unused_indexes to show the size and number of times an index scan has been used. On a non Heroku Postgres database you can run:

# SELECT
schemaname || '.' || relname AS table,
indexrelname AS index,
pg_size_pretty(pg_relation_size(i.indexrelid)) AS index_size,
idx_scan as index_scans
FROM pg_stat_user_indexes ui
JOIN pg_index i ON ui.indexrelid = i.indexrelid
WHERE NOT indisunique AND idx_scan < 50 AND pg_relation_size(relid) > 5 * 8192
ORDER BY pg_relation_size(i.indexrelid) / nullif(idx_scan, 0) DESC NULLS FIRST,
pg_relation_size(i.indexrelid) DESC;
table | index | index_size | index_scans
---------------------+--------------------------------------------+------------+-------------
public.grade_levels | index_placement_attempts_on_grade_level_id | 97 MB | 0
public.observations | observations_attrs_grade_resources | 33 MB | 0
public.messages | user_resource_id_idx | 12 MB | 0
(3 rows)

Costs of Indexing

There are really a couple of primary costs when it comes to indexing your data. The first is the overall size of the index. Indexes take size on disk, fortunately in most cases disk is pretty cheap. If you’re limited on disk size and not on your current performance then its pretty clear the trade-off you want to take. If you do need to get the size of your index you can do that by running:

# SELECT pg_size_pretty(pg_total_relation_size('idx_name'));

The harder trade off to look at is the cost in terms of throughput. As your data comes in there’s a cost for maintaining that index as the data within it has to be computed. If you’re doing crazy regex’s in your index then you can expect this to have an impact on your throughput.

Composite Indexes vs. Multiple Indiviual Indexes

A composite index is an index that includes multiple columns. Given an example table of purchases:

# \d purchases
Table "public.purchases"
Column | Type | Modifiers
-------------+-----------------------------+-----------
id | integer | not null
item | integer |
quantity | integer |
color | integer |

You might want to add an index on item and quantity together. You can do this with:

CREATE INDEX idx_purchases_item_quantity_color ON purchases (item, quantity, color)

From now on if you included item and quantity in a query its likely it would use this index just as it would if you used item, quantity and color. If you have a large varied set of data within each of these such an index can prove very useful. The caveat is that if you’re querying against only quantity and color then this index is useless, it must include the item column.

In contrast if you have three individual indexes Postgres may combine these or simply use one that would be the most efficient out of the three.

CREATE INDEX idx_purchases_item ON purchases (item);
CREATE INDEX idx_purchases_quantity ON purchases (quantity);
CREATE INDEX idx_purchases_color ON purchases (color);

Of course in this case if you query any individual column it would use the index if appropriate.

What Else

What else do you want to know about Postgres Indexing? Drop me a line craig.kerstiens at gmail.com or hop over to Postgres Guide and read a little there or even contribute some articles of your own.

Postgres Indexes – Expression/Functional Indexing

Wed, 29 May 2013 12:55:56 -0800

Postgres is rich with options for indexing. First you’ve got a variety of types, and beyond that you can do a variety of things with each of these such as create unique indexes, use conditions to index only a portion of your data, or create indexes based on complex expressions or functions. In cases where you commonly use various PostgreSQL functions in your application or reporting you can get some great gains from this.

Let’s take a look at a really simple case. Given a basic user table:

# \dt users
Table "public.users"
Column | Type | Modifiers
------------+-----------------------------+-----------
id | integer | not null
email | character varying(255) |
created_at | timestamp without time zone |

You may commonly want to run a report against it showing your signups by date. Let’s say you do this by running the query:

SELECT
count(*),
date_trunc('day', created_at)
FROM
users
GROUP BY
2;

If you’re commonly using date_trunc('day', created_at) for grouping, filtering, or projecting it out you can get some great gains by creating an index on this:

# CREATE INDEX idx_user_created ON users(date_trunc('day', created_at));

Of course you can go beyond the built in functions of Postgres and use more complicated functions you create yourself. For example if you have JSON stored within PostgreSQL, have PLV8 enabled, and want to create a Javascript function to parse and return the text for a given key:

# CREATE OR REPLACE FUNCTION
get_text(key text, data json)
RETURNS text $$
return data[key];
$$ LANGUAGE plv8 IMMUTABLE STRICT;

Of note in the above function is IMMUTABLE and STRICT. Immutable specifies that the function given the same inputs will return the same result. Strict means that if you send in NULL values you’ll get a null result.

Given some example data inside your JSON field:

{
"name": "Craig Kerstiens",
"location": "San Francisco",
"numbers": [
{
"type": "work",
"number": "123.456.7890"
},
{
"type": "home",
"number": "987.654.3210"
}
]
}

If you wanted to return just the name you could index on:

# CREATE INDEX idx_name ON users(get_text('name', json_data));

Or even combine with built ins for a case-insensitive version:

# CREATE INDEX idx_name ON users(lower(get_text('name', json_data)));

Indexes like all of the above can be useful when you’re filtering on something that postgres can take advantage of. In most cases any conditions with the exception of a LIKE beginning with a % work for this. With Postgres 9.2 even a count(*) in certain cases can take advantage of the index because of index only scans.

Whether you’re looking to take advantage of all the power of Javascript with JSON or another procedural langauge – or simply speed up a basic report using built in functions expression indexes can give you some great benefits.

My SQL Bad Habits

Sun, 26 May 2013 12:55:56 -0800

I’m reasonably proficient at SQL – a coworker when pseudocoding some logic for him pointed out that my pseudocode is what he thought was executable SQL. I’m fully capable of writing clear and readable SQL – which most SQL is not. Despite that I still have several bad habits when it comes to SQL. Without further adieu heres some of my dirty laundry so hopefully others can not make the same mistakes.

Order/Group by Column Numbers

When quickly iterating on a query its a lot less typing to put the column number as the thing you want to order by. Here’s a quick lightweight example:

SELECT
email,
created_at
FROM
users
ORDER BY 2 DESC
LIMIT 5;

This gives me my last 5 users that have signed up for my site. Of course as soon as I have this I may want to add some data to it, like their first name so I can send them a welcome email. I quickly alter the query to:

SELECT
email,
first_name,
created_at
FROM
users
ORDER BY 2 DESC
LIMIT 5;

And now I have 5 users that have signed up ordered by their first name. Sure its obvious when you have 1 column you’re ordering by, but when you have GROUP BY 1, 2, 3, 4, 5, 6 which is actually open in one of my tabs currently its a bit more confusing….

Though if you really want to have some fun, share a query with someone that looks something like this:

SELECT
email as "3",
first_name "2",
created_at "1"
FROM
users
ORDER BY "1", "3" DESC
LIMIT 5;

Implicit Joins

I seldom use the syntax INNER JOIN. Instead I simply put the two tables in my where clause and ensure I have a where condition. The problem with ensuring I have a where condition is sometimes I don’t, especially when you’re dealing with 3 tables.

SELECT
email,
product.name,
product.price
FROM
users,
orders,
items
WHERE users.id = orders.user_id
AND orders.id = items.order_id

Is less clear (especially when dealing with 5-6 tables) than the alternative:

SELECT
email,
product.name,
product.price
FROM users
INNER JOIN orders on users.id = orders.user_id
INNER JOIN items on orders.id = items.order_id

Lack of comments

I comment my SQL far less than I comment my code, yet it can be done just as easily. For example I have this in one of my queries:

SELECT convert_from(CAST(E'\\x' || array_to_string(ARRAY(
SELECT
CASE
WHEN length(r.m[1]) = 1
THEN encode(convert_to(r.m[1], 'SQL_ASCII'), 'hex')
ELSE substring(r.m[1] from 2 for 2)
END
FROM regexp_matches(url_here, '%[0-9a-f][0-9a-f]|.', 'gi') AS r(m)
), '') AS bytea), 'UTF8');

While this has its own issues theres no documentation around what this actually does, in contrast:

--- DECODES url ---
SELECT convert_from(CAST(E'\\x' || array_to_string(ARRAY(
SELECT
CASE
WHEN length(r.m[1]) = 1
THEN encode(convert_to(r.m[1], 'SQL_ASCII'), 'hex')
ELSE substring(r.m[1] from 2 for 2)
END
FROM regexp_matches(url_here, '%[0-9a-f][0-9a-f]|.', 'gi') AS r(m)
), '') AS bytea), 'UTF8');

Comments also work well inline at the end of a line.

Large Manually Generated Lists

A lot of times in working with some specific data set I’ll manually or automatically generate a list that I want to filter. A common example is filtering out staging/dev environments. I’ll often manually search and prune the list, then save that result for the queries I’m going to build going forward. This is a bit of effort but still feels reasonable the downside is it results in something like:

SELECT
foo
FROM
bar
WHERE
bar.id NOT IN (34723, 42735, 32321, 47205, 20375, 30261, 26194, 109371, 9313, 6351, 20184, 50273, 34735, 39854, 23954, 25323, 23405, 30528, 50182, 29340, 47659, ... and the list goes on)

SQL is meant to be reasonable for containing some level of logic. Data changes, hard coding keys is going to bite you at some point, spend the extra effort and re-use something thats clear.

What else

I’m sure theres plenty more; I suspect within a few minutes of sitting down with someone they could point out some other bad habits. While I know mine at least some of mine I still often know the trade-off. What are yours? I’d love to hear to document them for others so hopefully they can prevent developing the same bad habits. Let me know; craig.kerstiens@gmail.com

CX – Conference Experience (Facilitating Communication)

Mon, 29 Apr 2013 12:55:56 -0800

Following up on my earlier post about CX or Conference Experience – I’m going to dig in a bit on how you get good conversation to happen. In the past two years I’ve been to nearly 20 conferences, I’ve been to conferences with great talks, with great parties, with great swag, and hands down my favorite conferences have always been a result of great conversation. With the number of talks that are recorded and immediately available online after, what can I say I’m a hallway track guy.

I’ve seen a number of conferences intentionally design around this concept, in some ways the unconference is purely a hallway track conference. I’ve also seen conferences that weren’t clearly planned for this and have pulled off some of the best situations where people turn their phones off and engage in real conversation.

Breaks

Breaks are always a hard part to balance. Every conference organizer I’ve talked to felt they had the perfect mix for breaks. If you’re designing for convesation then you need to allow time for conversation to actually happen. Whether a multi-track or single track conference 15 minutes for breaks doesn’t give you the time to actually engage in conversation. If multitrack its time enough to grab a drink then head to the next room. If single track its time to have the entire room leave and then come back.

Even 30 minute breaks fall into this category. At a multitrack conference I’m usually saying hi to enough people during a 30 minute break that I don’t have the chance to really get deep into a conversation. If I do it usually results in skipping the next talk.

Staggered Talks

Locations for Conversations

Having places for people to get away an talk is critical, in a venue that only has 1 main room and no convenient places for a quick coffee or beer it becomes very hard for conversations to happen. If its on the agenda to facilitate such things thats great, but going a step further and allowing for it to be facilitated naturally is even better. On more than one occasion I’ve been mid conversation wtih someone, we cut it off agreeing to pick it back up at the next break only to never locate them again for the rest of the conference. I’d have just as much preferred to continue the conversation there, but sadly there was in the middle of a talk room with no where else feasible to go.

Water

This one caught me by surprise… The event was PyCodeConf in Miami and it was one of the evening activities – a pool party with food, drinks, and mariachi band. Early into the evening, inevitably someone was thrown into the pool (by friends in a general good nature), but thus ruining a lovely iPhone. As a result I’m pretty most people, just as I did, went to their rooms but their cell phone away, or set it near their stuff. Much of the rest of the evening was an entirely internet and twitter free evening. Causing lots of uninterrupted conversations to happen and resulted in a pool full of geeks.

Welcoming noobs

The big difference between just assuming this will happen and actively working to facilitate it is who gets involved in the conversation. For a first time presenter there’s a wealth of nerves about talking. For a first time attendee there’s a wealth of knowledge to soak up. First time attendees may not feel as comfortable interjecting them into a conversation, approaching presenters, or talking to someone they’ve been following for years – yet they have just as much to add as anyone else.

Creating places where more conversation is happening helps ease this and build a better community.

CX – Conference Experience (People)

Mon, 29 Apr 2013 12:55:56 -0800

Following up on my earlier post about CX or Conference Experience – I’m going to dig in a bit on getting the right people there. There’s a lot of different ways to approach this from having a good ratio of:

Designers to Developers
Females to Males
Noobs to well known community members
Overall lack of suits

While this is by no means a how we can solve all of the above problems, theres some bas

Talks

Quality talks are obviously important. A conference should at whatever cost ensure that the talks are good. This can happen a variety of ways, but regardless of the method you should ensure talks are worthwhile; since this is largely what most people are paying to attend for.

Invite-only

Perhaps the easiest way to do this is via invite only to speakers you know will do a good job. The downside ot this of course is that you must already know enough people to fill out a good agenda, and also you limit the ability for others to contribute.

Coaching/Practicing

This is not an either/or option with any of the other pieces for getting a good agenda. Having presenters do a trial run can ensure a minimum level of quality, and working with them to coach them can help make the talk even more effective. Its likely that the organizers know the audience as well as anyone, so no one better than them to help with this. Of course this is a time sink for both parties, but can give good returns. Of course this could be done independent of a specific conference such as through speakup.

At a very least getting a quick run through, outline, or something of that nature can ensure that a presenter doesn’t fill a 45 minute talk slot with only 5 minutes of content.

Open CFP

Likely the most common approach to getting speakers is having an entirely open CFP. Its typical then that either the organizers or a speaker selection commitee then discusses and makes selections. This can usually work to have a nice balance of experienced and known to be good speakers and newer less experienced speakers that can have great potential as well.

Open Open CFP

Going even further is an open CFP where all talks are published after the CFP then voted on. While this does a great deal to ensure transparency, it doesn’t necessarily improve ensuring theres a great line up of speakers. Being able to write an interesting talk proposal is an entirely separate process from delivering an interesting talk.

There’s a common question lately for both the open CFP and open open ones one whether to do blind review of the talks. The logic here is an attempt to be entirely fair, versus having some bias. This is an understandable goal, but can come at the expense of quality. Truth be told; I’m not entirely sure of a way to balance this.

Choosing

There’s definitely not a one size fits all. If you’re goal is to provide a good list of talks then you should keep that in mind in how you decide your talks. If you’re goal is to pull others in then it should be shaped differently. Zach Holman recently talked a bit about this and had some interesting ideas to minimize risk for new speakers.

CX – Conference Experience (Talks)

Sat, 27 Apr 2013 12:55:56 -0800

A couple of weekends ago I had the great opportunity to attend lessconf. It was an all around great conference, and as a result of the greatness I ended up having a conversation with a few people around conference experience. I must give much of the credit to Swift, as he mentioned he’d already been thinking alot about this since Waza. In general it feels like there’s a few key themes that any conference should focus on, then a lot of small things that can really push it over the top. Here’s a few:

In general the key areas for any great conference are:

Talks
Great people
Ensuring communication happens
Bonus points

Digging in deeper on the first area…

Talks

Invite-only

Coaching/Practicing

At a very least getting a quick run through, outline, or something of that nature can ensure that a presenter doesn’t fill a 45 minute talk slot with only 5 minutes of content.

Open CFP

Open Open CFP

Choosing

Heroku's Acquisition 2 Years Later

Thu, 18 Apr 2013 12:55:56 -0800

Just over two years ago, Heroku was acquired. I was around and peripheral to this just before the acquisition and came on board only barely after. While there is a large group of people that have been there longer than I have (several for 4+ years) I’m still commonly asked how things have changed, how things work, and other questions of that nature. I wrote about some of these processes over a year ago in the months after joining Heroku around our hiring, our teams, and how we work. Many of these things haven’t changed, and yet almost always at a conference I’m asked how are things different since being acquired.

Here’s my personal take (and while I don’t typically include this – to be safe, this is not an official Heroku view of what’s changed).

Are things different since the acquisition?

Heroku in many ways operates like a wholly owned subsidiary or as an independent business within Salesforce. We still have our own office space, our own IT (if you can even call it that), and in general entirely own workflow. I have a salesforce email account that automatically forwards to my Heroku google apps account, and that’s about as much as I know about it. In fact, we’re currently preparing for our new office and its planned to be our home for somewhere between the next 5-10 years.

At the same time some things do flow through Salesforce, some of those things are logistical some strategic. First the perhaps most unfortunate part – hiring. This has not really changed how we hire, but rather once you are coming on board theres more paperwork. In general this is the most painful part that instead of 1/2 pieces of paper theres a few to sign. At the same time some of these come with some greater gains such as benefit, etc. To be honest I dont fully recall what all of the paper work is, but either way there is a bit more of it. The other area we really see this effect is expense accounts. All expenses flow through concur, which in my opinion is a pretty good solution. I’ve used many worse expense solutions and seen few if any better.

Do they influence the product?

Not really

In general we at Heroku aim to have some broader alignment around what we’re trying to accomplish. Salesforce believes in helping their customers become a customer company. We believe that developers are worth of great experiences. Salesforce believes in ensuring its customers are successful. Heroku believes that developers should focus on adding value of their customers not just keeping the lights on. In all of these things its important to have alignment. Not for the company we’re building for the next six weeks or six months, but for the next six years. Given both Salesforce’s and Heroku’s worldview I believe what we’re trying to accomplish is in good alignment.

But does Salesforce influence our product roadmap? No, we aim to listen to our customers problems and build to solve those problems and through that improve the way software is delivered. At the time Salesforce acquired us, the pieces of Cedar were already in motion; including Procfile support, logplex, and other pieces. Later was fully productized Cedar and which was a goal we’d long had – making other languages such as Node, Java, and Python available. This wasn’t driven because Salesforce wanted Java, but because we saw value in delivering the same value of the platform to other communities.

Have we changed?

Sure, we’ve changed, but I’d surmise very little if at all due to Salesforce. We’ve grown from a 20 person company to now over 100. We were a primarily local team and now have people all over. Company offsites get a bit harder to coordinate with 100 people, and finding a date that absoluely everyone can make it is nearly impossible. Many of our changes are more strictly a change of growth than they are because of Salesforce.

I dont get it, why acquire you then?

Because we’re aligned in a much bigger vision, we can both work together long term to accomplish our goal.

The internet is changing the world; software is everywhere.

I believe Salesforce truly understands this. Heroku does as well (though, at the time of the acquisition I fully suspect many Herokai held their breath to see how it would all go). Six weeks went by without trying to ‘change us’, then six months, and here we are over two years later. In many ways so many of us have started to better understand that Salesforce truly understands the above.

If you understand the above, then you know that delivering software and improving that process gives any business a competitive advantage. This is at the core of the value we aim to provide. E.g. If you ranked all companies in terms of how strong they aligned with Heroku, Salesforce might be at the very top.

Using array_agg in Postgres – powerful and flexible

Wed, 17 Apr 2013 12:55:56 -0800

In almost any application it’s common to want to aggregate some set of values together, commonly in a comma separated form. Most developers do this by running a query to get much of the raw data, looping over the data and pushing it into a set, appending each new value to the appropriate key. Hopefully, it’s not a surprise that there’s a much better way to do this with PostgreSQL.

Postgres has a flexible and robust array datatype that comes with a variety of functions. Even without taking advantage of the array datatype in your application, you can still take advantage of some of the functions to get the functionality you need. Lets take a look at an example schema and use case.

An example

Given a project management application, you may have users who have projects that have tasks. An example piece of functionality might be to send an email with a list of all projects that have tasks that are past their due dates of completion. Your schema might look something like this:

 # \d users
Table "public.users"
Column | Type | Modifiers
------------+-----------------------------+-----------
id | integer | not null
email | character varying(255) |
...
# \d projects
Table "public.projects"
Column | Type | Modifiers
------------+-----------------------------+-----------
id | integer | not null
user_id | integer | not null
name | character varying(255) | not null
...
# \d tasks
Table "public.tasks"
Column | Type | Modifiers
--------------+-----------------------------+-----------
id | integer | not null
project_id | integer | not null
completed_at | timestamp without time zone |
due_at | timestamp without time zone |
...

To get a list of all projects that have tasks that haven’t been completed, you would start with something like:

SELECT
projects.name
FROM
projects,
tasks
WHERE projects.id = tasks.project_id
AND tasks.due_at > tasks.completed_at
AND tasks.due_at > now()

This would give you a list of projects which you could then easily join this with users:

SELECT
users.email
projects.name
FROM
projects,
tasks,
users
WHERE projects.id = tasks.project_id
AND tasks.due_at > tasks.completed_at
AND tasks.due_at > now()
AND users.id = projects.user_id

At this point you’ve got everything you need to pull this up into Ruby, Python, or other language of your choice and then build the full set. However if this is thousands or even hundreds of results you’ll be spending more time than necessary, grouping this data for a sensible email. With 3 other small changes you can have this already formatted for you to immediately send of in an email. The first is using a handy function called array_agg which will aggregate items and then you can format them how you wish. The second is just ensuring you’re grouping correctly. Finally you’ll want to unnest the array so it formats the data in a clean way for you.

Looking at it all put together:

SELECT
users.email,
array_to_string(array_agg(projects.name), ',')) as projects
FROM
projects,
tasks,
users
WHERE projects.id = tasks.project_id
AND tasks.due_at > tasks.completed_at
AND tasks.due_at > now()
AND users.id = projects.user_id
GROUP BY
users.email

This would give you a nice clean result of projects that have overdue tasks that you could then send to the user in an email:

 email | projects
---------------------------+-------------------
craig.kerstiens@gmail.com | blog, timetracker
craig@heroku.com | foo, bar, baz

Scaling Evangelism – Creating Advocates

Tue, 16 Apr 2013 12:55:56 -0800

The first area I tend to think about when it comes to developer marketing is around advocates. A simple definition of an advocate is someone that speaks/writes in favor of some thing.

Creating advocates is a means of creating more of myself to go out and talk loudly and ideally effectively. However, they do take time to cultivate and it is a hard item to track. But by cultivating advocates I’m able to scale what would otherwise be a bottleneck of my own personal time.

Mentorship

The people in your group of 3 you’ll have nearly daily communications with, even if quick. These are the first people you think to work and collaborate with. It’s the case with all, but especially so here that there may be some strong mutual benefit. With both the groups of 3 and 12 you’ll be looking to:

Co-author content with
Collaborate on projects
Help amplify their content

Helping them Succeed

For a few years I’ve followed a similar process to this. Essentially creating advocates is done through first making them successful, then empowering to talk about their success. I’ve often done this by giving people a private channel to me – through IM, SMS, Phone, Skype, or even in person. Sure its easy enough to get my work email address, its a pretty easy one to guess, and even at that address I do aim to respond to every request. These channels in reality don’t make me more available; but they do allow a different type of communication to happen.

My personal approach has found it more out of place to ask how someones weekend was over email – IM or Skype make this easy. Broader questions of how someones actually doing in life generally are much easier to do over a coffee or a beer. This allows for a deeper relationship; at the same time I can ensure everyone I interact with is successful. It’s easier to ensure someones successful if they’re willing to talk to you, actually connecting with people enables this.

Communicate

Once someones been successful theres usually some value in their story. This doesn’t need to come in a marketing-ese approach though. Instead contribute back the value of what you did to the broader community with clear steps how they can get the same benefit. For many developers they feel talking about their own success is a bit of imposter syndrome. While this isn’t always the case; if it is you can help to work with someone to highlight the value in what they’ve done. This could be as simple as some encouragement to actually helping review drafts of blog posts.

Of course once someones started talking about what they were able to accomplish help them celebrate the success. Yelling from the rooftops only helps to further their efforts and give them the confidence to do it all over again.

Conclusion

To create advocates you don’t have to write a list of people you wish to shoot for, but realizing there are limiitations on your time can help you be more intentional. Knowing that where you spend your time and who you spend it with has a cost can allow you to be more explicit about ensuring you’re providing value to them, from there it should naturally run its course. Want to discuss this more? Email me at craig.kerstiens at gmail.com.

Doing Marketing (for developers) Differently

Fri, 12 Apr 2013 12:55:56 -0800

As developers we can tend to be a fickle bunch; especially in certain open source communities. We like to see intelligence and systems applied to things, hope for a better world through making things open, and appreciate when others relate to our world as we often attempt to relate to theirs. Marketing is often a loaded word when it comes to developers – leaving mixed feelings about webinars, email campaigns, and the like.

A quick clarification in terms of marketing I’m referring to product marketing, but of a technical product. This is based on the idea that developers are the new kingmakers and when marketing a product to them it needs to be done differently.

While many of the above steps can add immense value, they often miss when it comes to developers. By changing these processes only slightly there can be much more efficiency in connecting with and actually delivering the value you hope for to developers.

Email Marketing

This is an area that there are more and more companies already starting to solve the right problems. Companies like intercom.io and customer.io are allowing for better tracking of what users are doing and notifying them at the right time instead of via massive impersonal campaigns. Developers build systems that take into account similar factores every day, you should be doing the same.

Emails that are sent at the wrong time or miss on what someone is trying to do can often do more damage towards building a rapport than good. Pushing content to solve a recent issue or problem at a relevant time should be table stakes for any email thats sent.

Community Engagement

Whether at conferences or in online collaborations developers are more social than so many give them credit for. We appreciate when things are done to improve the greater good particularly for developers, but often for the world as well. From supporting individual developers or projects financially such as through gittip or certain kickstarter projects all the way to employing developers to work full time on open source.

Being Better Than Webinars

After doing a webinar for a particular audience I had a Heroku co-worker come up to me. This individual has contributed many ideas that have become core functionality such as the Cedar stack and is generally open minded and listens very well. He came up to me and genuinely asked, “so what is a webinar?” I explained its like a live webcast with an opportunity for questions at the end. He was quiet for minute and responded, “Why can’t we just say that?”

There are some words that simply send off the wrong signals. Webinar rather clearly is one of those. As a community we’ve shown that we have a desire for this kind of content. The conference community has continued to grow steadily, and while they are many parts to every conference a pretty common one is the talk track itself. A talk at a conference is essentially a live in person webinar, of course the experience is often a bit richer.

Instead of the term webinar what is wrong with online office hours or webcasts with an opportunity for questions? Even doing the same thing with a different name is underdelivering on what we’re capable of.

I welcome hearing from others directly at craig.kerstiens@gmail.com on what they’re doing to engage with developers thats working. Not only working in the sense of eyeballs or dollars, but in adding value, in improving communities, in make the world better for developers. If there’s enough interest would be happy to post more detail around things I’ve seen work and what I hear works for others.

Why I Blog

Sun, 31 Mar 2013 12:55:56 -0800

I blog because I’m lazy. There’s more too it though: In any given day I may explain something to someone, the first time I do this I make a bit of a mental note. The second time I do this, especially within a short time frame I make a physical note of this in the form of the title of a blog post. Once I’m already to a second time of doing this its almost inevitable I’ll continue repeating myself – and its valuable to others.

Becoming replaceable

My goal in doing this is actually to become heavily replaceable. Keeping information locked away means I’m the only one capable of doing it. Making myself highly replaceable means I can continue to work on new and interesting things.

Lots can be shared

I don’t recall exactly where I read it, I believe it may have been James Governor, but any email that doesn’t contain proprietary info or trade secrets should be blogged. I fully expect the sentiment around this is something to the effect that anything that contains good ideas/learning/processes can also be helpful to others.

Distilling information

For every time I explain something I often linger about one piece too long, hit something that doesn’t need to be discussed, or miss something entirely. Putting an idea or guide together in written form you re-read it, which is hard to do when simply verbally explaining. If you’ve video taped yourself before for public speaking practice you know this is an awkward but valuable experience. Its much easier to practice as an exercise in writing.

Economies of scale

While in some ways I am absolutely lazy when I blog I get economies of scale I couldn’t otherwise have. On a given day on a high day I may have 15 one on one conversations. Every post has the opportunity to become a one on one conversation.

One on one conversation is my preferred method instead of in a large group. In a related area I’ve been having an ongoing conversation wth Ryan Daigle on whether blog posts should have comments. I opt for having you reach out to me directly via email, he believes comments should be available. Putting this in words crystalizes that to me its just relative to how you prefer to interact.

Summary

As with any other post, this is one conversation I’ve now had a few times over. Hopefully this gives some basis to why I feel others should contribute content as well. Whether you want make your self dispensible, refine your thoughts, or reach new economies of scale I believe its a worthwhile exercise for many. If you have other reasons you feel its valuable as always please reach out.

Prioritizing and Planning within Heroku Postgres

Wed, 13 Mar 2013 12:55:56 -0800

Over a year ago I blogged about Heroku’s approach to Teams and Tools. Since that time Heroku has grown from around 25 people to over 100, we’ve continued to iterate and find new tools that work for how we do things. For many of the product management and software engineering books I’ve read I’ve yet to find something that helps a team priorize in a fashion I that feels right.

One process emerged nearly a year ago from within the Heroku Postgres team and is now followed by many others. Within a team this process is now commonly conducted each 6 months. Lets take a look at how this process looks

It Starts with Ideas

Hopefully having ideas of things to work on isn’t a problem, if it is just go spend some time with customers – listen to their problems, see how they use the product, then come back and write down the ideas. For most teams this is simply an excercise of thinking back and writing it down. Some teams at Heroku have resorted to keeping running backlogs of things they’d like to do this. We do this by keeping a Trello board which columns for:

New ideas
Ponies
Stallions

Ponies and Stallions are things that would be great to do, however a sizeable amount of work must be done on them and we’re not currently tackling them. Ponies are less sizeable and likely to get done not in coming weeks but perhaps in coming months up to a year. Stallions are great but large effort and may or may not get done but in the category of things we would like to be able to do.

Once you’ve got your ideas whether in your head on a backlog we begin by writing them out typically on sticky notes or index cards.

Laying it all out

From here we create a simple grid:

The grid has two axis. One is for impact the other for difficulty. At this point we aim to lay out every idea that we’ve already written down into a quadrant. Commonly this is done at team offsites where the team is free of distractions and able to devote appropriate time to it. Being able to accomplish this in one sitting with the team is important to having cohesion around the result.

A plan

At this point hopefully its quite obvious what you want to tackle. If there’s anything in the top right it should be an easy win for something for you to focus on. From we often transcribe this into a powerpoint/keynote document and highlight things that we will definitely aim to accomplish in the next 6 months as well as things we’re intentionally not working on. This leaves us with an artifact of both things we will work on and explicit things we wont work on.

What works for us

In general we try to have more work than we can tackle to ensure we’re constrained in a good form and not wasting idle time of people. Ensuring we’re selective about the things we’re working on and that we’re working on the right things works for us. We’ve found this simple exercise valuable for many teams to plan and ensure we’re working on those right things. Of course this may not work for everyone but for our goals and culture aligns well for us.

If you’ve got simple but unique techniques that work for your team as always would love to hear about them – craig.kerstiens@gmail.com

Fixing Database Connections in Django

Thu, 07 Mar 2013 12:55:56 -0800

If you’re looking to get better performance from your Django apps you can check out Pro Django, PostgreSQL High Performance, or read some my earlier posts on Postgres Performance. All of these are of course good things to do – you can also start by correcting an incredibly common but also painful performance issue, that until 1.6 is unaddressed in Django.

Django’s current default behavior is to establish a connection for each request within a Django application. In many cases any particularly in distributed cloud environments this is a large time sink of your response time. An example application running on Heroku shows a typical connection time of 70ms. A large part of this time is the SSL negotiation that occurs in connecting to your database, which is a good practice to ensure security of your data. Regardless, this is a long time in simply establishing a connection. As a point of comparisson its commonly encourage that most queries to your database are under 10ms.

An example that highlights this in a small lightweight application shows the bulk of a request time being within a connection displayed by New Relic:

One option to remedy this is by running a connection pooler on your Database side such as Pgpool or PgBouncer. In fact Ask the Pony already highlighted these potential gains. While running an external DB they’re essentially testing the benefits of conncetion pooling. This is an obvious gain and can be in a much more lightweight format.

Connection Pooling in Django

As Django establishes a connection on each request it has an opportunity to both pool connections and persist connections. There are two major options for pooling, each works quite well with Django and provides some dramatic improvements. While the first request may take the 70ms of connection time, subsequent requests show absolutely no connection time since the connection already exists. This is highlighed by these two comparissons of before and after in actually the times it grabs a connection:

Clearly theres plenty of value to having a persistent connection or a pool within Django itself. As of today theres a few options for that:

Django-PostgresPool

The first Django-PostgresPool is created by kennethreitz. As in general I’d encourage the use of dj_database_url you can easily begin using his package (once installed) with:

import dj_database_url
DATABASE = { 'default': dj_database_url.config() }
DATABASES['default']['ENGINE'] = 'django_postgrespool'

An important thing to note is if you’re using South you’ll also want to setup the adapter for it:

SOUTH_DATABASE_ADAPTERS = {
'default': 'south.db.postgresql_psycopg2'
}

djorm-ext-pool

The second option djorm-ext-pool is created by niwibe. Once you’ve installed djorm-ext-pool you then add it to your INSTALLED_APPS within your settings.py. From here then you can setup your pool:

DJORM_POOL_OPTIONS = {
"pool_size": 20,
"max_overflow": 0
}

django-db-pool

The third and final option is django-db-pool. You can set it up with:

DATABASES = {'default': dj_database_url.config()}
DATABASES['default']['ENGINE'] = 'dbpool.db.backends.postgresql_psycopg2'
DATABASES['default']['OPTIONS'] = {
'MAX_CONNS': 10
}

Gotchas

Each of these does work with recent versions of Django, though in some cases there are gotchas. If using a prodution worthy python web server such as Gunicorn or uwsgi and running with gevent or eventlet some edge cases can present themselves. Regardless of potential gotchas it is worth attempting this and of course providing feedback to maintainers and the community as you find those.

The future

Django more recently has directly started to address these issues of large costs of establishing a connection. The first major step here is this patch from Aymeric. You can find more dicussion around this particular patch here. Essentially with this patch which will hit in Django 1.6 developers then get a persistent connection which will help reduce the time. If you’re interested in trying the 1.6 master you can do this by adding it to your requirements.txt as:

https://github.com/django/django/archive/master.zip

At this point it does not introduce pooling which could allow even more gains, though I’m sure if there’s enough need it’ll be on a roadmap at some point. Though, as it stands today before 1.6 your best bet is one of the above options.

Simple database read scaling without sharding in rails

Wed, 06 Mar 2013 12:55:56 -0800

In an earlier post I provided a high level overview of sharding. Sharding while a very solid approach to scaling capacity versus simply only relying on vertical scaling can also be a time intensive one. Additionally in some cases certain sites may only need extra capacity for a short lived period of time. Fortunately theres a nice middle ground alternative for scaling capacity that works well in quite a few cases. It even has a benefit that can potentially in place of sharding.

This method results in scaling your reads to replica databases, you can do this on Heroku by taking advantage of followers. A follower is a read only database on Heroku Postgres that receives asynchronous updates of your data usually only lagging a very few commits behind. This means you can write all of your data to the leader (main) database, and then read from another.

While you can arbitrarily do this there’s some major benefits to doing it based on the models. This is because Postgres maintains a cache on each instance its running on. Though you may have the same dataset, Postgres maintains frequently accessed or queried data in the cache giving you better performance. For more on this you can read earlier posts on PostgreSQL Performance.

Setting it up with Rails

With a follower database created you can begin adding support for this to your application. The first thing is to add the gem to your Gemfile:

gem 'ar-octopus', :require => "octopus"

Then of course to install it with bundle install. Now we can actually begin to add the code needed to have specific models access the follower.

octopus:
shards:
shard_sqlite:
adapter: sqlite3
database: db/db_one.sqlite3
pool: 5
timeout: 5000
shard_pgsql:
adapter: postgresql
username: postgres
password:
database: db_two
encoding: unicode

class Project < ActiveRecord::Base
octopus_establish_connection(:adapter => "sqlite3", :database => "db_one")
end

Getting more out of psql (The PostgreSQL CLI)

Thu, 21 Feb 2013 12:55:56 -0800

After my last post I had a variety of readers reach out about many different tweaks they’d made to their workflows using with psql. One people Grégoire Hubert had a wondeful extensive list of items. Grégoire has been a freelance in web development and he has worked with Postgresql for some time now in addition to being the author of Pomm. Without further ado heres what he has to say on how he uses psql:

Get the most of psql

Psql, the CLI postgreSQL client, is a powerful tool. Sadly, lot of developers are not aware of the features and instead look for a GUI to provide what they need. Let’s fly over what can psql do for you.

Feel yourself at home

One of the most common misconception people have about CLI is «They are a poor user interface». C’mon, the CLI is the most efficient user interface ever. There is nothing to disturb you from what you are doing and you are by far fastest without switching to your mouse all the time. Let’s see how we can configure psql at our convenience.

First, you’ll have managed to choose a nice and fancy terminal font like monofur or inconsolata. Do not underestimate the power of the font

The nice line style shown above can be set with \pset linestyle unicode and \pset border 2. This is just an example of the many environment variables you can play with to get your preferred style of working out of psql.

For example, I found the character ¤ the most accurate to express nullity (instead of default NULL). Let’s just \pset null ¤ and here it is:

SELECT * FROM very_interesting_stat;
┌──────┬──────┬──────┬──────┬──────┐
│ a │ b │ c │ d │ e │
├──────┼──────┼──────┼──────┼──────┤
│ 9.06 │ ¤ │ ¤ │ ¤ │ ¤ │
│ 7.30 │ 3.55 │ 7.57 │ 3.31 │ ¤ │
│ 7.20 │ 5.08 │ ¤ │ 6.58 │ 5.90 │
...

Another hugely value to get environment variables is colors in the prompt. Colors in the prompt are important because it makes easier to spot where output starts and ends between two interactions at the console. The PROMPT1 environment variable will even let you set an indicator to notify you are inside a transaction or not, give this a try for a sweet surprise…

\set PROMPT1 '%[%033[33;1m%]%x%[%033[0m%]%[%033[1m%]%/%[%033[0m%]%R%# '

I also like to disable the pager by default \pset pager off and display the time every issued query takes \timing. If you are used to psql, you may notice in the picture above, some content is wrapped. This is \pset format wrapped option.

Of course, writing all that on every connection would be a pain, so just write them in a ~/.psqlrc file, it will be sourced every time psql is launched.

If you are familiar with bash or other recent unix shells, you might also declare aliases in your configuration file. You can do the same with psql. For example if you want to have a query for slow queries such as from this earlier post but not have to remember the query every time you can set it up as:

\set show_slow_queries
'SELECT
(total_time / 1000 / 60) as total_minutes,
(total_time/calls) as average_time, query
FROM pg_stat_statements
ORDER BY 1 DESC
LIMIT 100;'

Now, just entering :show_slow_queries in your psql client will launch this query and give you the results:

 total_time | avg_time | query
------------------+------------------+------------------------------------------------------------
295.761165833319 | 10.1374053278061 | SELECT id FROM users WHERE email LIKE ?
219.138564283326 | 80.24530822355305 | SELECT * FROM address WHERE user_id = ? AND current = True

Psql at your fingertips

Now you have got a fancy prompt, here is the real question you ask, what can psql do for me ? and \? has all of the answers. It has built-in queries to describe almost all database objects from tables to operators, indexes, triggers etc… with clever auto-completion. Not only completion on tables and columns – but also on aliases (sweet), SQL commands (w00t) and database objects.

Now we can enter some SQL commands. As usual, you need to check in the documentation how the heck to write this damn ALTER TABLE. Relax, psql proposes inline documentation. Just enter \h alter table (auto complete w00t) and you ll be ok.

Interacting with your editor

psql provides two very handy commands: \e and \i. This last command sources a sql file in the client’s current session. \e edits the last command using the editor defined in the EDITOR shell environment variables (aka vim). This grant you with real editor feature when it comes to writing long queries. What psql does, it saves the buffer in a temporary file and fires up the editor with that file. Once the editor is terminated, psql sources the file. Of course, you can use your editor to save queries in other places where they would be under version control, but the \e has a serious limitation: it spawns only the last query. Even if you sent several queries on the same line. (Note that \r clears psql’s last query buffer).

Note: \ef my_function opens stored function source code (With auto completion, I know, it’s awesome).

Vim users can here benefit from Vim’s server mode. If you launch a vim specifying a server name (let’ say “PSQL”) somewhere, and set the EDITOR variable as is export EDITOR="vim --servername PSQL --remote-tab-wait then psql will open a new tab on the running vim with the last query and run it as soon as you close this tab. Tmux or gnu/screen users will split their screen to have Vim and psql running on the same terminal window.

Call a friend

Vim power users know it is possible to pipe a buffer (or selection) directly in a program that can be … psql (Using the :w !psql syntax). Even from the shell, you might want to take advantage of the fantastic \copy feature that loads formated file in the database (I use it to load apache logs). But always having to specify connection parameters are a hassle. Let’s use shell environment instead. Psql is sensitive to the following variables:

PGDATABASE
PGHOST
PGPORT
PGUSER
PGCLUSTER (debian wrapper).

Set them once for all in you shell environment and call psql to connect to the database. In case you want to skip password prompt, you can store your pass in a 600 mode access file named .pgpass in your home (do not do that on shared or exposed computers). Although this is nice for development database servers, I do NOT recommend this for production servers since it should not be easy to mess with them.

Resource for additional information is … the man page and Postgres Docs. All PostgreSQL documentation is an example of what software reference documentation should be. Enjoy!

How I work with Postgres – psql, My PostgreSQL Admin

Wed, 13 Feb 2013 12:55:56 -0800

On at least a weekly basis and not uncommonly multiple times in a single week I get this question:

@neilmiddleton I’ve been hunting for a nice PG interface that works within other things. PGAdmin kinda works, except the SQL editor is a piece of shit

Sometimes it leans more to, what is the Sequel Pro equivilant for Postgres. My default answer is I just use psql, though I do have to then go on to explain how I use it. For those just interested you can read more below or just get the highlights here:

Set your default EDITOR then use \e
On postgres 9.2 and up \x auto is your friend
Set history to unlimited
\d all the things

Before going into detail on why psql works perfectly fine as an interface I want to rant for a minute about what the problems with current editors are and where I expect them to go in the future. First this is not a knock on the work thats been done on previous ones, for their time PgAdmin, phpPgAdmin, and others were valuable tools, but we’re coming to a point where theres a broader set of users of databases than ever before and empowering them is becoming ever more important.

Empowering developers, DBA’s, product people, marketers and others to be comfortable with their database will lead to more people taking advantage of whats in their data. pg_stat_statements was a great start to this laying a great foundation for valuable information being captured. Even with all of the powerful stats being captured in the statistics of PostgreSQL so many are still terrified when they see something like:

 QUERY PLAN
----------------------------------------------------------------------------------------------------------------
Hash Join (cost=4.25..8.62 rows=100 width=107) (actual time=0.126..0.230 rows=100 loops=1)
Hash Cond: (purchases.user_id = users.id)
-> Seq Scan on purchases (cost=0.00..3.00 rows=100 width=84) (actual time=0.012..0.035 rows=100 loops=1)
-> Hash (cost=3.00..3.00 rows=100 width=27) (actual time=0.097..0.097 rows=100 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 6kB
-> Seq Scan on users (cost=0.00..3.00 rows=100 width=27) (actual time=0.007..0.042 rows=100 loops=1)
Total runtime: 0.799 ms
(7 rows)

Empowering more developers by surfacing this information in a digestable form, such as building on top of pg_stat_statements tools such as datascope by @leinweber and getting this to be part of the default admin we will truly begin empowering a new set of user.

But enough of a detour, those tools aren’t available today. If you’re interested in helping build those to make the community better please reach out. For now I live in a work where I’m quite content with simple ole psql here’s how:

Editor

Ensuring you’ve exported your preferred editor to the environment variable EDITOR when you run \e it will allow you to view and edit your last run query in your editor of choice. This works for vim, emacs, or even sublime text.

export EDITOR=subl
psql
\e

Gives me:

Note you need to make sure you connect with psql and have your editor set, once you do that saving and exiting the file will then execute the query

\x auto

psql has long had a method of formatting output. You can toggle this on and off easily by just running the \x command. Running a basic query you get the output:

SELECT *
FROM users
LIMIt 1;
id | first_name | last_name | email | data | created_at | updated_at | last_login
----+------------+-----------+----------------------------+------------+---------------------+---------------------+---------------------
1 | Rosemary | Wassink | Rosemary.Wassink@yahoo.com | "sex"=>"F" | 2010-07-01 18:16:00 | 2011-05-14 11:47:00 | 2011-06-07 23:04:00

With toggling the output and re-running the same query we can see how its now formatted:

\x
Expanded display is on.
craig=# SELECT * from users limit 1;
-[ RECORD 1 ]--------------------------
id | 1
first_name | Rosemary
last_name | Wassink
email | Rosemary.Wassink@yahoo.com
data | "sex"=>"F"
created_at | 2010-07-01 18:16:00
updated_at | 2011-05-14 11:47:00
last_login | 2011-06-07 23:04:00

Using \x auto will automatically put this in what Postgres believes is the most intelligible format to read it in.

psql history

Hopefully this needs no justification… having an unlimited history of all your queries is incredibly handy. Ensuring you set the following environment variables will ensure you never lose that query you ran several months ago again:

export HISTFILESIZE=
export HISTSIZE=

\d

And while the last on the list one of the first things I do when connecting to any database is check out whats in it. I don’t do this by running a bunch of queries but rather checking out the schema and then poking at definitions of specific tables. \d and variations on it are incredibly handy for this. Here’s a few highlights below:

Listing all relations with simply \d:

\d
List of relations
Schema | Name | Type | Owner
--------+------------------+---------------+-------
public | products | table | craig
public | products_id_seq | sequence | craig
public | purchases | table | craig
public | purchases_id_seq | sequence | craig
public | redis_db0 | foreign table | craig
public | users | table | craig
public | users_id_seq | sequence | craig
(7 rows)

List only all tables with dt:

\dt
List of relations
Schema | Name | Type | Owner
--------+-----------+-------+-------
public | products | table | craig
public | purchases | table | craig
public | users | table | craig
(3 rows)

Describe a specific relation with \d RELATIONNAMEHERE:

\d users
Table "public.users"
Column | Type | Modifiers
------------+-----------------------------+----------------------------------------------------
id | integer | not null default nextval('users_id_seq'::regclass)
first_name | character varying(50) |
last_name | character varying(50) |
email | character varying(255) |
data | hstore |
created_at | timestamp without time zone |
updated_at | timestamp without time zone |
last_login | timestamp without time zone |

One more pro-tip if you’re running a transaction with many tables and forget which are involved in it you can run ‘\d *transaction*’ and it’ll display tables curently affected.

Have a tool you prefer, have something you use daily in psql that I missed, or interested in helping create a new admin experience please reach out and lets talk craig.kerstiens at gmail.com

Introducing django-db-tools

Fri, 08 Feb 2013 12:55:56 -0800

For any successful web application there is likely to come a time when you need to conduct some large migration on the backend. I dont mean simple add a column here or add an index there, but rather truly sizeable migrations… Going from MySQL to Postgres or migrating from an older version of Postgres such as a 32 bit instance to a newer 64 bit instance. In these cases the default approach is to just schedule downtime often throwing up a splash screen saying so.

For many sites this approach is simply wrong and lazy, with little effort you can improve the experience and there by ease the burden in conducting these types of migrations. By having the ability to turn your site into a read only mode which Simon Wilson talked about in his post on Lanyrd you can still continue to operate just in a limited capacity. Andrew Godwin further talks about some of this as well in regards to the Lanyrd move and even includes the script they used to migrate data from MySQL to Postgres. Though just in talking with Simon about this a week ago it occurred to me they had not released the code for their read-only mode.

Finally onto the announcing, today I’m releasing django-db-tools. This is currently a very lightweight utility that allows you to flip your site into two modes.

Anonymous Mode

For sites that offer a bulk of their data to unauthenticated users anonymous mode will be what you want. This ensures all users appear logged out and thus cannot interact with data. To enable anonymous mode you’d simple set the environment variable or config var on heroku as follows:

READ_ONLY_MODE = True

Restricting POSTs

The other bucket of sites is one that allows users to stay logged in but not insert data. Django did not appear to have a convenient means to know whether data was actually being inserted into the DB or not. As a good practice when inserting data it should be receiving a HTTP POST. The GET_ONLY_MODE mimmicks all POSTs as if they were sent via GETs thus hopefully eliminating inserting data into your application. To turn it on simply set the environment variabel or config var on heroku to:

GET_ONLY_MODE = True

Installing

The tool itself is largely middleware, to install:

Run pip install django-db-tools or add it to your requirements.txt
Add db_tools to your INSTALLED_APPS in your settings.py
Add 'dbtools.middleware.ReadOnlyMiddleware', to your MIDDLEWARE_CLASSES in settings.py

Contributing

As with all code this is largely a work in progress. There’s many items still to do such as providing default copy and error pages and potentially handling other edge cases. I’d welcome others to contribute and give feedback if they find it helpful or how it can be improved on Github.

Sharding your database

Fri, 30 Nov 2012 12:55:56 -0800

I’m increasingly encountering users on Heroku that are encountering the need to [shard](http://en.wikipedia.org/wiki/Shard_(database_architecture)) their data. For most users this is something you delay as long as possible as you can generally go for sometime before you have to worry about it. Additionally scaling up your database is often a reasonable approach early on and something I encourage as a starting point as scaling up is easy to do with regards to databases. However, for the 1% of users that do need to shard when the time comes many are left wondering where to start, hence the following guide.

What and Why

Sharding is the process of splitting up your data so it resides in different tables or often different physical databases. Sharding is helpful when you have some specific set of data that outgrows either storage or reasonable performance within a single database.

Logical Shards

First when initially implementing sharding you’ll want to create an arbitrary number of logical shards. This will allow you to change less code later when it comes to adding more shards. You’ll also want to define your shards to the power of 2. Generally I’d recommend for most services 1024 can be a good number, I believe Instagram actually used 4096, either can really be an appropriate number. For simplicity sake lets start with an example of using 4 logical shards. First lets look at an example set of users:

 id | email | name
----+---------------------------+-----------------
1 | craig.kerstiens@gmail.com | Craig Kerstiens
2 | john.doe@gmail.com | John Doe
3 | jane.doe@gmail.com | Jane Doe
4 | user4@gmail.com | User 4
5 | user5@gmail.com | User 5
6 | user6@gmail.com | User 6
7 | user7@gmail.com | User 7
8 | user8@gmail.com | User 8

Dividing these up into logical shards we’re going to have something that looks roughly like this:

Its important when sharding that you find a mechanism that requires you to not hit the database. As the above example shows its using the ID of the row inside the database instead we’re likely going to want to determine the shard based on a hash of some value such as the email:

logical_shard = hash(User.email) % 4

Physical Shards

From here we’ll then take the logical shards and create actual physical shards. If you have a single physical shard you’re using a single database, but the rest of your application code is ready to handle additional shards. For now lets use an example of two physical shards, the end result would be dividing our data up somehow like this:

The physical shard to access can easily be counted by taking the modulus of the logical shard its on by the physical shards that exist:

total_physical_shards = 2
total_logical_shards = 4
logical_shard = hash(User.email) % total_logical_shards
physical_shard = logical_shard % total_physical_shards

Generating IDs (Primary Keys)

As you’re distributing data across multiple databases you’ll want to avoid using an integer as your primary key. This would cause for keys to be duplicated within your database and make for a headache when attempting to report against your data. Instead the ideal is to use a UUID as the primary key. By using a UUID and generating this in either your application code or within your database you ensure each User ID is actually unique.

Adding Capacity

The best case scenario for most web applications, such is the case for Instagram, is to have to scale beyond their initial capacity, in order to do this you’ll simple expand the number of physical shards. In order to do this you’ll want to move data from one physical shard to another, then remove data from the old physical shard. Its also generally a good practice to grow your physical shards in powers of 2 the same way you would your logical shards. Lets take a look at a clearer example of how we would do this using the Heroku Postgres Service…

First we’re going to add a follower for each shard we have:

We’re then going to promote our followers to be their own independent databases which can accept writes. This means we’ll have two copies that can be written to with the same data:

At this point you can update your application code to have the new number of physical shards and it should begin writing data to the appropriate place. Of course you do still want to clean up some of that extra data. To do this you’ll want to remove the data that doesn’t belong in the appropriate shard. For example, id of 3 wouldn’t belong in physical shard 1 any more. Now we can run a background process to clean up such data:

Conclusion

While many applications may never need to scale out their database, when they do, sharding can be both straight forward and effective. While I would encourage many to scale up first as it is an easy option, hopefully this provides further guidance to how to scale out. For those that do anticipate this needed planning for it early with key things such as using UUID’s can make the process less painful.

This article of course only grazes the surface, if there’s interest from readers there will be more specifics to follow with actual code examples.

How I Write SQL

Sat, 17 Nov 2012 12:55:56 -0800

I recently got asked by a friend and former co-worker how I write SQL. At first this caught me by surprise and I assumed there was nothing different, but after a few additional comments on it, it became clear most people have no concept for creating clean readable SQL. So without further adieu here’s how I write SQL, with a built up example query.

First let’s understand an example schema:

# \dt
Schema | Name | Type | Owner
--------+----------------------------+-------+----------------
public | app_rating | table | craig
public | app_recommendation | table | craig
public | app_userprofile | table | craig
public | app_wine | table | craig
public | app_winemakeup | table | craig
public | app_winery | table | craig
public | auth_user | table | craig

The above schema contains wines from wineries, that users give ratings and notes to. Especially relevant is the app_rating table, it contains a variety of things we’re going to want report against:

# \d app_rating
Table "public.app_rating"
Column | Type | Modifiers
------------+--------------------------+---------------------------------------------------------
id | integer | not null default nextval('app_rating_id_seq'::regclass)
user_id | integer | not null
wine_id | integer | not null
rated_at | date | not null
rating | integer | not null
notes | text |
tags | character varying(255)[] |
created_at | timestamp with time zone | not null

Most of the above should be pretty straightforward, though if you’re unfamiliar with Arrays in Postgres check out this earlier article

Given all this data lets say we want to produce some query that for a given wine contains the winery, the average rating, the tags for it. Diving in I’ll typically start by creating each key part then pulling it together. Let’s start with grabbing the average.

But first some basic structure, for maximum readability I make sure to use all caps for reserved SQL words. For a large query I make sure all my columns/conditions are on their own line. So to get the average it would look something like this:

SELECT
avg(rating),
wine_id
FROM
app_wine
GROUP BY
wine_id;

Next I’ll work with the array of tags which has some specific things to Postgres:

SELECT DISTINCT
unnest(tags) as tag,
wine_id
FROM
app_rating
GROUP BY
wine_id, tags;

Finally I’m going to put it all together. This is going to have an additional query to get the winery and the wine name as well. We’re also going to use CTE’s (Common Table Expressions), think of these as temporary views that can make your query more readable:

WITH
wine_ratings as (
SELECT
avg(rating) as rating,
wine_id
FROM
app_rating
GROUP BY
wine_id),
wine_tags as (
SELECT DISTINCT
unnest(tags) as tag,
wine_id
FROM
app_rating
GROUP BY
wine_id, tags),
wine_detail as (
SELECT
app_wine.name as name,
app_wine.id,
app_winery.name as winery
FROM
app_wine,
app_winery
WHERE app_wine.winery_id = app_winery.id
)
SELECT
name,
rating,
array_agg(tag),
winery
FROM
wine_ratings,
wine_detail
LEFT OUTER JOIN
wine_tags ON wine_detail.id = wine_tags.wine_id
WHERE wine_detail.id = wine_ratings.wine_id
GROUP BY
name,
rating,
winery
ORDER BY
rating DESC

One thing to point out, is SELECT, FROM and ORDER BY are followed by a new line. When I have WHERE multiple conditions I ensure the AND and the condition occur on the same line. This is intentional to make those easier to read as well as easy to remove/add. The key to allowing it to still be readable is an extra two spaces before the AND so the condition aligns with the above one. This would appear something similar to:

SELECT foo
FROM bar
WHERE foo.id = bar.foo_id
AND foo.created_at > now() - '7 days'::INTERVAL

And just for an example we get this result from the query:

 name | rating | array_agg | winery
-----------------------+--------+--------------------+------------------------
Bordeaux Blend | 5.0 | {'dry', 'smooth'} | Chateau Rahoul
Cabernet Franc | 5.0 | {'chocolate'} | Beaucanon
Cabernet Sauvignon | 5.0 | {'young', 'dry'} | Hawkes

While very long, this should ideally be quite legible.

Using Postgres Arrays in Django

Tue, 06 Nov 2012 12:55:56 -0800

A few weeks back I did a brief feature highlight on Postgres arrays. Since that time I’ve found myself using them with increasing regularity on small side projects. Much of this time I’m using Django and of course not opting to write raw SQL to be able to use arrays. Django actually makes it quite simple to work with Arrays in Postgres with a package by Andrey Antukh. Lets get started by installing two libraries:

pip install djorm-ext-pgarray
pip install djorm-ext-expressions

The first library is for support for the array field type, the second allows us to more easily mix bits of SQL within the Django ORM.

Now within our models.py we’ll want import the library then we’ll have an entirely new field type available to us:

from djorm_pgarray.fields import ArrayField
from djorm_expressions.models import ExpressionManager
class Post(models.Model):
subject = models.CharField(max_length=255)
content = models.TextField()
tags = ArrayField(dbtype="varchar(255)")

Now we can begin using this. For example to create a simple blog post:

Post(subject='foo', content='bar', tags=['hello','world'])
post.save()

In this example we’re able to tag blog posts as one normally might, without requiring an extra model to join against. Taking advantage of the SQL Expressions library by Andrey as well we can query a blog post contains a certain tag:

qs = Posts.objects.where(
SqlExpression("tags", "@>", ['postgres', 'django'])
)

Now to show some contrast lets take a look at how you would do the same thing without using the Array field:

class Tag(models.Model):
name = models.CharField(max_length=255)
class Post(models.Model):
subject = models.CharField(max_length=255)
content = models.TextField()
tags = models.ManyToManyField(Tag)

Using the many-to-many relationship within Django requires you to save the Post, then add your tag is it requires having a primary key first:

 post = Post(subject='foo', content='bar')
post.save()
post.tags.add('hello','world')

This means we have two calls to save it, and similarly querying it is less clean as well:

posts = Post.objects.filter(tags__name="hello").distinct()

This would gives us all posts that have the tag hello in them. We could also search for multiple ones:

posts = Post.objects.filter(tags__in=["hello","world"]).distinct()

In the latter case distinct is especially important since it could return a post twice if its tagged with both hello and world.

In addition to an improvement on performance the gotchas are far less in dealing with a single array datatype and make it a quick but great win in certain cases like above.

Redis in my Postgres

Thu, 18 Oct 2012 12:55:56 -0800

Yesterday there was a post which hit Hacker News that talked about using SQL to access Mongo. While this is powerful I think much of the true value was entirely missed within the post.

SQL is an expressive language, though people are often okay with accessing Mongo data through its own ORM. The real value is that you could actually query the data from within Postgres then join across your data stores, without having to do some ETL process to move data around. Think… joining sales data from Postgres with user reviews stored in Mongo or searching for visits to a website (retained in redis) against purchases by user in Postgres.

The mechanism pointed out was a MongoDB Foreign Data Wrapper. A Foreign Data Wrapper or FDW essentially lets you connect to an external datastore from within a Postgres database. In addition to the Mongo FDW released the other day there’s many others. For example Postgres 9.0 and up ships with one called db_link, which lets you query and join across two different Postgres databases. Beyond that there’s support for a variety of other data stores including some you may have never expected:

Lets look at actually getting the Redis one running then see what some of the power of it really looks like. First we have to get the code then build it:

git clone git://github.com/antirez/hiredis.git cd hiredis make sudo make install

Then download the FDW from PGXN

PATH=/Applications/Postgres.app/Contents/MacOS/bin/:$PATH USE_PGXS=1 make
sudo PATH=/Applications/Postgres.app/Contents/MacOS/bin/:$PATH USE_PGXS=1 make install

Now you’ll want to connect to your Postgres database, using psql and enable the extension, and finally create your foreign table to your redis server. This is assuming a locally running redis, though you could connect to a remote just as easily:

CREATE EXTENSION redis_fdw;
CREATE SERVER redis_server
FOREIGN DATA WRAPPER redis_fdw
OPTIONS (address '127.0.0.1', port '6379');
CREATE FOREIGN TABLE redis_db0 (key text, value text)
SERVER redis_server
OPTIONS (database '0');
CREATE USER MAPPING FOR PUBLIC
SERVER redis_server
OPTIONS (password 'secret');

With dt we can now see the list of all of our tables:

# \dt
List of relations
Schema | Name | Type | Owner
--------+-----------+-------+-------
public | products | table | craig
public | purchases | table | craig
public | users | table | craig
(3 rows)

And with a full \d we can see not just the tables but also see we have a foreign table as well:

# \d
List of relations
Schema | Name | Type | Owner
--------+------------------+---------------+-------
public | products | table | craig
public | purchases | table | craig
public | redis_db0 | foreign table | craig
public | users | table | craig
(4 rows)

Finally we can actually query against it:

# SELECT * from redis_db0 limit 5;
key | value
---------+-------
user_40 | 44
user_41 | 32
user_42 | 11
user_43 | 3
user_80 | 7
(5 rows)

Or more interestingly we can join it against other parts of our data and filter accordingly. Below we’ll show users that have logged in at least 40 times:

SELECT
id,
email,
value as visits
FROM
users,
redis_db0
WHERE
('user_' || cast(id as text)) = cast(redis_db0.key as text)
AND cast(value as int) > 40;
id | email | visits
----+----------------------------+--------
40 | Cherryl.Crissman@gmail.com | 44
44 | Brady.Paramo@gmail.com | 44
46 | Laronda.Razor@yahoo.com | 44
47 | Karole.Sosnowski@gmail.com | 44
12 | Jami.Jeon@yahoo.com | 49
14 | Jenee.Morrissey@gmail.com | 47
16 | Yuki.Alber@yahoo.com | 48
18 | Marquis.Tartaglia@aol.com | 44
31 | Collin.Parrilla@gmail.com | 46
39 | Nydia.Bukowski@aol.com | 47
2 | Gaye.Monteith@aol.com | 48
6 | Letitia.Tripodi@aol.com | 41
(12 rows)

While several of the current FDWs are not production ready yet, some are more battle tested including db_link, textfile, ODBC and MySQL ones.

Postgres Pooling with Django

Tue, 02 Oct 2012 12:55:56 -0800

A feature thats glaringly missing within Django and common in many other frameworks including many Java frameworks and Rails is connection pooling for your database connection. As most Django users are Postgres users the default answer is to use something like pgPool or pgBouncer. This are tools that you can run that will persist a connection to your Postgres database intended to offer:

Connection Pooling
Replication
Load Balancing

Its of not that PgBouncer is intended very specifically for pooling while pgPool does much more.

Each of these in many cases can come with caveats though. If there are issues within your network they may not re-establish the connection properly. They also are not known to handle SSL renegotiation very well. Finally given running one more piece of software to reduce connection times it seems like a lot of overhead to simply reduce the connection time to your database.

Is connection time a real problem?

Given a well refined app, with a well refined schema with appropriate indexes your view should be doing things pretty quickly. If this is the case without some form of connection pooling and running in a cloud environment (in this case Heroku) your application performance looks like:

If you’ll notice that about 50% of our request time was in Postgres. The hard part to see is how much of this is actually doing something. In this case its issuing some very basic queries then rendering a very basic view.

The solution

By using something in the other Python ORM, SQLAlchemy, we can take advantage of its connection pooling. Large thanks to Kenneth Reitz for packaging this up into an easy to install and easy to use format as a Django DB backend. Using django_postgrespool it will take advantage of connection pooling and we can then see the performance after:

Understanding Postgres Performance

Mon, 01 Oct 2012 12:55:56 -0800

Update theres a more recent post that expands further on where to start optimizing specific queries, and of course if you want to dig into optimizing your infrastructure High Performance PostgreSQL is still a great read

For many application developers their database is a black box. Data goes in, comes back out and in between there developers hope its a pretty short time span. Without becoming a DBA there’s a few pieces of data that most application developers can easily grok which will help them understand if their database is performing adequately. This post will provide some quick tips that allow you to determine whether your database performance is slowing down your app, and if so what you can do about it.

Understanding your Cache and its Hit Rate

The typical rule for most applications is that only a fraction of its data is regularly accessed. As with many other things data can tend to follow the 80/20 rule with 20% of your data accounting for 80% of the reads and often times its higher than this. Postgres itself actually tracks access patterns of your data and will on its own keep frequently accessed data in cache. Generally you want your database to have a cache hit rate of about 99%. You can find your cache hit rate with:

SELECT
sum(heap_blks_read) as heap_read,
sum(heap_blks_hit) as heap_hit,
sum(heap_blks_hit) / (sum(heap_blks_hit) + sum(heap_blks_read)) as ratio
FROM
pg_statio_user_tables;

We can see in this dataclip that the cache rate for Heroku Postgres is 99.99%. If you find yourself with a ratio significantly lower than 99% then you likely want to consider increasing the cache available to your database, you can do this on Heroku Postgres by performing a fast database changeover or on something like EC2 by performing a dump/restore to a larger instance size.

Understanding Index Usage

The other primary piece for improving performance is indexes. Several frameworks will add indexes on your primary keys, though if you’re searching on other fields or joining heavily you may need to manually add such indexes.

Indexes are most valuable across large tables as well. While accessing data from cache is faster than disk, even data within memory can be slow if Postgres must parse through hundreds of thousands of rows to identify if they meet a certain condition. To generate a list of your tables in your database with the largest ones first and the percentage of time which they use an index you can run:

SELECT
relname,
100 * idx_scan / (seq_scan + idx_scan) percent_of_times_index_used,
n_live_tup rows_in_table
FROM
pg_stat_user_tables
WHERE
seq_scan + idx_scan > 0
ORDER BY
n_live_tup DESC;

While there is no perfect answer, if you’re not somewhere around 99% on any table over 10,000 rows you may want to consider adding an index. When examining where to add an index you should look at what kind of queries you’re running. Generally you’ll want to add indexes where you’re looking up by some other id or on values that you’re commonly filtering on such as created_at fields.

Pro tip: If you’re adding an index on a production database use CREATE INDEX CONCURRENTLY to have it build your index in the background and not hold a lock on your table. The limitation to creating indexes concurrently is they can typically take 2-3 times longer to create and can’t be run within a transaction. Though for any large production site these trade-offs are worth the trade-off in experience to your end users.

Heroku Dashboard as an Example

Looking at a real world example of the recently launched Heroku dashboard, we can run this query and see our results:

# SELECT relname, 100 * idx_scan / (seq_scan + idx_scan) percent_of_times_index_used, n_live_tup rows_in_table FROM pg_stat_user_tables ORDER BY n_live_tup DESC;
relname | percent_of_times_index_used | rows_in_table
---------------------+-----------------------------+---------------
events | 0 | 669917
app_infos_user_info | 0 | 198218
app_infos | 50 | 175640
user_info | 3 | 46718
rollouts | 0 | 34078
favorites | 0 | 3059
schema_migrations | 0 | 2
authorizations | 0 | 0
delayed_jobs | 23 | 0

From this we can wee the events table which has around 700,000 rows has no indexes that have been used. From here you could investigate within my application and see some of the common queries that are used, one example is pulling the events for this blog post which you are reaching. You can see your execution plan by running an EXPLAIN ANALYZE which gives you can get a better idea of the performance of a specific query:

EXPLAIN ANALYZE SELECT * FROM events WHERE app_info_id = 7559; QUERY PLAN
-------------------------------------------------------------------
Seq Scan on events (cost=0.00..63749.03 rows=38 width=688) (actual time=2.538..660.785 rows=89 loops=1)
Filter: (app_info_id = 7559)
Total runtime: 660.885 ms

Given there’s a sequential scan across all that data this is an area we can optimize with an index. We can add our index concurrently to prevent locking on that table and then see how performance is:

CREATE INDEX CONCURRENTLY idx_events_app_info_id ON events(app_info_id);
EXPLAIN ANALYZE SELECT * FROM events WHERE app_info_id = 7559;
----------------------------------------------------------------------
Index Scan using idx_events_app_info_id on events (cost=0.00..23.40 rows=38 width=688) (actual time=0.021..0.115 rows=89 loops=1)
Index Cond: (app_info_id = 7559)
Total runtime: 0.200 ms

While we can see the obvious improvement in this single query we can examine the results in New Relic and see that we’ve significantly reduced our time spent in the database with the addition of this and a few other indexes:

Index Cache Hit Rate

Finally to combine the two if you’re interested in how many of your indexes are within your cache you can run:

SELECT
sum(idx_blks_read) as idx_read,
sum(idx_blks_hit) as idx_hit,
(sum(idx_blks_hit) - sum(idx_blks_read)) / sum(idx_blks_hit) as ratio
FROM
pg_statio_user_indexes;

Generally, you should also expect this to be in the 99% similar to your regular cache hit rate.

Arrays in Postgres

Mon, 20 Aug 2012 12:55:56 -0800

Postgres out of the box has an abundance of datatypes, from standard numeric datatypes to geometric or even network datatypes. With extensions you can get even more out of it as earlier discussed with hStore. Though with all of the datatypes its easy to miss out on some of them that are there, in fact one of my favorites is often missed entirely. The Array datatype lets you do just as you’d expect, store an array inside Postgres. With this you can often get some of the functionality you’d want in a single table when you might traditionally have expanded to multiple tables.

The broader question may be why you’d actually want to use an array. One good reason may be if you’re an application developer its how you think of your data, so why not model it the same way. As you’ll see below it can be easier than joining and aggregating across a set of rows. Also depending on your case you performance could be improved, though mileage may vary here as it does depend on the data you’re storing.

First a bit of a hacky example… Lets say you have a basic website that sells stuff, and instead of having a purchase ID and a total you want to include the quantity, id, and price of each item in a single row. With a bit of a messy foreign key (using a decimal) you could store all of this within a single row:

CREATE TABLE purchases (
id integer NOT NULL,
user_id integer,
items decimal(10,2) [100][1],
occurred_at timestamp
);

With this table I could have an array that holds multiple records of:

The item purchased
The quantity
The price

An insert to this table would look something like:

INSERT INTO purchases VALUES (1, 37, '\{\{15.0, 1.0, 25.0\}, \{15.0, 1.0, 25.0\}\}', now());
INSERT INTO purchases VALUES (2, 2, '{{11.0, 1.0, 4.99}}', now());

You can see a full example with UDF’s to compute total here

A more practical example may actually be using an array for tags. If you were to tag your purchases:

CREATE TABLE products (
id integer NOT NULL,
title character varying(255),
description text,
tags text[],
price numeric(10,2)
);

You could then query those just as you’d expect with a basic select statement, or could even expand the array to have an individual result per row:

 SELECT title, unnest(tags) items
FROM products

Protip: If you’re using arrays you can also use Postgres’ Gin and Gist indexes to search for products quickly that contain certain tags. Given an index you could search where its tagged with some id:

-- Search where product contains tag ids 1 AND 2
SELECT *
FROM products
WHERE tags @> ARRAY[1, 2]
-- Search where product contains tag ids 1 OR 2
SELECT *
FROM products
WHERE tags && ARRAY[1, 2]

Rapid API Prototyping with Heroku Postgres Dataclips

Thu, 19 Jul 2012 12:55:56 -0800

For small and large applications there often comes a time where you’re busy creating an API. The API creation process usually takes the form of something like: Design your API, Implement your API, Test and Evaluate, Rinse and Repeat. Historically with implementing the API fully you can’t see how you truly feel about the result, causing this cycle to take longer than it should. Heroku Postgres has Dataclips, which (among other things) can be used for quickly prototyping APIs. Dataclips allows you to easily share data, but more importantly consume it in a form much like you would a restful API. Lets take a look at how this would work:

Given a schema

We can see from the screen shot of the schema above we can see we have a few tables. These tables are the complete works of Shakespeare thanks to opensourceshakespeare. Lets take a couple of hypothetical endpoints we’ve decided on that we’d like to expose for users and test as an API.

The number of works per year
Drone factory (this is a fun one courtesy of Richard Morrison - @mozz100 essentially who has the longest paragraphs on average in his works.

Create a dataclip

Now we open up our database on Heroku Postgres and go down near the bottom to the dataclips section. Click the plus to create a new dataclip and we can enter our queries.

SELECT
year,
count(*)
FROM
works
GROUP BY
year
ORDER BY
year ASC

Click Create Clip and you’ll be redirected to your new dataclip. This unique URL will always return the results of that query and if you want to shift it to a real time API that re-runs the query you can flip the now switch. For my simple example above my url for this dataclip is now https://dataclips.heroku.com/fcroecrluhwltbjinstfqmwyneex.

Using the dataclip as a prototype API

There are many different use cases for dataclips, but of course for our sake we care about prototyping an API instead of sharing the data. To do this you can simply append the format you want to the url above and test as if it were an API:

Essentially anything you can bake down to a query (much like you would in your App’s API layer) you can expose in this form to quickly test. For a more complicated example you can check out the Drone factor query.

Protips for Conference Talks

Tue, 19 Jun 2012 12:55:56 -0800

A few weeks ago I was sitting at the hotel in Zurich with Jacob Kaplan Moss prior to DjangoCon EU enjoying a beer, talking about Django, and discussing a bit about our upcoming talks for the conference. He talked briefly about his upcoming keynote and how he was doing something different, including essentially 5 mini-talks. This seemed interesting enough, but the part that surprised me was when Jacob said, “I’m among friends here so it’ll be a good place to test this format.” Many if not all in the community know who Jacob is as one of the creators of Django, though still to be “among friends” at a roughly 300 person conference surprised me. However, as someone thats keynoted several times, spoken at conferences for many years, and familiar with many people in the community; for the 150-200 people there he had not met before, he was still truly among friends. While giving a keynote is never an easy feat, it seems to ease the worry ahead of time of doing such.

Saturday night there was a bit of conversation on twitter that had some related discussion. In the last minute rush for DjangoCon US talk submissions a few that have been involved in the community for some time discussed submitting their first talk proposals. In parallel to that was some discussion around diversity, I volunteered the idea of not including presenters name’s in the list when reviewing and voting on talks. While both of the above are controversial topics alone, I hope that can be left to another later time. The key idea that emerged that can be helpful to anyone looking to submit a talk to a conference is how the “pro’s do it”, as Jeremy Dunck put it.

So without further adieu, hopefully without speaking too much for him here’s likely why Jacob viewed his 300 person keynote as being among friends:

1. Start small

Whether its practicing the talk itself or writing the abstract for a proposal practicing each step lets you refine this well ahead of time. In my experience, providing a talk description for a meetup can often be far harder than for a conference. For a meetup I feel confined to 2-3 sentences, versus an abstract a solid paragraph or two. Yet, I still have to make it as exciting, because of course I don’t want 4 people to show up to the meetup because it sounds uninteresting. In the case I’m most familiar with DjangoCon and DjangoCon.eu both happen once a year, though many smaller regional conferences related to Python exist and especially meetup groups:

Regional Conferences

Meetups

Other avenues

Ignite
At your office to colleagues

Check lanyard for a list of relevant events where you might be able to start at

Take your pick, there’s a conference or a meet-up near you. Even better if you can manage to do both at some point.

2. Get feedback early

If you’ve been around the community you may know what talks would be interesting. Though even if you’ve been involved in the community and not given a talk at a conference before, this may be harder to come up with than you realize. If you’re thinking about submitting a talk, its likely you have many you can get feedback from. Do this early and often, if you’re submitting a talk its likely you have something interesting to say, the hardest part can be having that succinctly come across in an abstract. Sure there’s certain hot topics, such as in the Django community:

Diversity
Class Based Views
Testing
APIs/Services

At almost every Django and Python conference there will probably be 3+ talks on each of these topics submitted. Why does yours stand out differently? Its likely in how you position the problem and the answer you can deliver in a talk, which isn’t a quality of the talk but the abstract rather.

3. Focus on your talk more than others

I’m not sure this is the case of all presenters at every conference, but every time I’ve paid attention to it; presenters seem to make themselves slightly less available during a conference. Or at least they do this up until they give their talk. Often you’ll have several of the presenters missing other talks while they’re holed up in their room prepping or maybe their enjoying a hallway track with other presenters. Either way they’re likely walking through their slides and talk key points in some form. As a presenter the dynamic of the conference changes ever so slightly, you spend a little less time on all of the talks that happen (fortunately videos help for catching up later). Though on the plus side, you also get the opportunity to have engaging conversations after your talk about a topic you hopefully find interesting.

4. Get the critical feedback

People know its nerve wrecking to give a talk in front of a crowd. Each time you do it, it becomes easier, but in my experience its never as easy as a conversation over a cup of coffee. Because of this, most people will be quite encouraging of any job you do. This isn’t a bad thing, encouragement is good, your talk will be better the second time you give it. However, by getting the critical feedback out of people you’ll be able to improve your talk much more the second/third/fourth time around.

Schemaless Postgres in Django

Mon, 11 Jun 2012 12:55:56 -0800

Earlier this week while I was at DjangoCon EU there seemed to be a surprising amount of talk about MongoDB. My problem with this isn’t with MongoDB, but in the assumption that only Mongo can solve what you’re looking for. By and far the most common feature is people want schemaless. It gives them flexibility in their data model and lets them iterate quickly. While I still opt for relational models that map cleanly to a relational database, there are cases where developers may want schemaless. I gave a quick lightning talk on this with slides here, but it is worth recapping.

The example given by pydanny was a product catalog. You may have different items you want to store for a catalog. Lets take an example below:

django_pony = {'name': 'Django Pony', 'rating': '5'}
pink_pony = {'name': 'Pink Pony', 'rating': '4', 'color': 'pink'}

In the case of a product catalog it could be understandable you don’t want to normalize every possible spec for the product. The argument for Mongo is so commonly that you can easily work with this data model. Admittedly it is quite simple:

from pymongo import Connection
connection = Connection()
django_pony = {'name': 'Django Pony', 'rating': '5'}
connection.product.insert(django_pony)

The problem is that this assumes other schemaless options don’t exist or are inferior.

Enter hStore

hStore is a column type within Postgres. It is a key value store that allows you to store a dictionary, with text values. It alone is not a full document store replacement, but allows for flexibility in your data model where you need it while letting you use relational models elsewhere. Its not exactly new within Postgres either, as its been available since 8.4, however its recently become easier to work with and is supported in some form or another by more frameworks.

To do the same as above we only need to do a few steps:

Within your Postgres 9.1 or higher database:

create extension hstore;

If you don’t have Postgres already if you’re on a Mac quickly grab and install Postgres.app.

Now for actually using it within your Django application. You first need to install django-hstore to your project:

pip install django-hstore

Then you can add it to your models:

from django.db import models
from django_hstore import hstore
class Product(models.Model):
name = models.CharField(max_length=250)
data = hstore.DictionaryField(db_index=True)
objects = hstore.Manager()
def __unicode__(self):
return self.name

Once you’ve sync’ed your database models you can now add your products in a very similar form to above:

Product.objects.create(name='Django Pony', data={'rating': '5'})
Product.objects.create(name='Pony', data={'color': 'pink', 'rating': '4'})

At this point you’ve got your schemaless data into Postgres and can interact with it. However, this is where the benefits of Postgres quickly start to come into play. In addition to the schemaless model you’re able to add indexes and filter on keys/values just as you would expect. In fact within Django it maps similarly to how it would within the ORM:

colored_ponies = Product.objects.filter(data__contains='color')
print colored_ponies[0].data['color']
favorite_ponies = Product.objects.filter(data__contains={'rating': '5'})
print colored_ponies[0]

To add indexes within postgres we could index on the same items that we’re filtering above:

create index on product(((data->'color')::int)) where ((data->'size') is not null);
create index on product(((data->'rating')::int)) where ((data->'rating') = '5');

If you need a sample project to start with immediately check out this one on github.

When evaluating any database its important to choose the features you’re evaluating it on, then examine further. Mongo may be great because its schemaless, this doesn’t mean an RDMS can’t be schemaless as well (and do a good job of it). In the long run, schemaless is likely to just become another feature in databases, but more on that later.

Why PostgreSQL Part 2

Mon, 07 May 2012 12:55:56 -0800

This post is a list of many of the reasons to use Postgres, much this content as well as how to use these features will later be curated within PostgresGuide.com. If you need to get started check out Postgres.app for Mac, or get a Cloud instance at Heroku Postgres for free

Last week I did a post on the many reasons to use Postgres. My goal with the post was two fold:

Call out some of the historical arguments against it that don’t hold any more
Highlight some of the awesome but more unique features that are less commonly found in databases.

Once I published the post it was clear and was immediately pointed out in the comments and on hacker news that I missed quite a few features that I’d mostly come to take for granted. Perhaps this is due to so much awesomeness existing within Postgres. A large thanks to everyone for calling these out. To help consolidate many of these, here’s a second list of the many reasons to use PostgreSQL:

Create Index Concurrently

On most traditional databases when you create an index it holds a lock on the table while it creates the index. This means that the table is more or less useless during that time. When you’re starting out this isn’t a problem, but as your data grows and you then add indexes later to improve performance it could mean downtime just to add an index. Not surprisingly Postgres has a great means of adding an index without holding that lock. Simply doing CREATE INDEX CONCURRENTLY instead of CREATE INDEX will create your index without holding the lock.

Of course with many features there are caveats, in the case of creating your index concurrently it does take somewhere on the order of 2-3 times longer, and cannot be done within a transaction

Transactional DDL

If you’ve ever run a migration had something break mid-way, either due to a constraint or some other means you understand what pain can come of quickly untangling such. Typically migrations on a schema are intended to be run holistically and if they fail you want to fully rollback. Some other databases such as Oracle in recent versions and SQL server do support, this. And of course Postgres supports wrapping your DDL inside a transaction. This means if an error does occur you can simply rollback and have the previous DDL statements rolled back with it, leaving your schema migrations as safe as your data, and your application in a consistent state.

Foreign Data Wrappers

I talked before about other languages within your database such as Ruby or Python, but what if you wanted to talk to other databases from your database. Postgres’s Foreign Data Wrapper allows you to fully wrap external data systems and join on them in a similar fashion to as if they existed locally within the database. Here’s a sampling of just a few of the foreign data wrappers that exist:

In fact you can even use Multicorn to allow you to write other foreign data wrappers in Python. An example of how this can be done, in this case with Database.com/Force.com can be found here

Conditional Constraints and Partial Indexes

In a similar fashion to affecting only part of your data you may care about an index on only a portion of your data. Or you may care about placing a constraint only where a certain condition is true. Take an example case of the white pages. Within the white pages you only have one active address, but you’ve had multiple ones over recent years. You likely wouldn’t care about the past addresses being indexed, but would want everyones current address to be indexed. With Partial Indexes becomes simple and straight forward:

 CREATE INDEX idx_address_current ON address (user_id) WHERE current IS True;

Postgres in the Cloud

Postgres has been chosen by individual shops and been proven to scale by places such as Instagram and Disqus. Perhaps even more importantly it’s becoming easy to use PostgreSQL due to the many clouds that are running Postgres as a Service, such as:

Full disclosure, I work at Heroku, and am also a large fan of their database service

Listen/Notify

If you want to use your database as a queue there’s some cases where it just won’t work, as heavily discussed in a recent write-up. However, much of this could be discarded if you included Postgres in this discussion due to Listen/Notify. Postgres will allow you to LISTEN to an event and of course NOTIFY for when the event has occurred. A great example of this in action is Ryan Smith’s Queue Classic.

Fast column addition/removal

Want to add a column or remove one. With millions of records this modification in some databases could take seconds or even minutes, in cases I’ve even heard horror stories of adding a column taking hours. With Postgres this is a near immediate action. The only time you pay a higher price is when you choose to write default data to a new column.

Table Inheritance

Want inheritance in your database just like you have in with models inside your application code? Not a problem for Postgres. You can have one table easily inherit for another, leaving a cleaner data model within your database while still giving all the flexibility you’d like on your data model. The Postgres docs on DDL Inheritance do a great job of documenting how to use this and giving a very simple but clear use case.

Per transaction synchronous replication

The default mode for Postgres streaming replication is asynchronous. This works well when you want to maintain performance, but also care about your data. There are cases where you want your replication to be synchronous though. Furthermore, for some cases asynchronous may work well enough where as other data you may care more about the data and want synchronous replication, within the same database. For example, if you care about user sign-ups and purchases, but ratings of products and comments is less important Postgres provides the ability to treat them as such. With Postgres you can have per transaction synchronous replication, this means you could have strong data guarantee on your user and purchase transactions, and less guarantees on others. This means you only pay the extra performance cost where you really care about versus an all or nothing approach you have with other databases.

Conclusion

Hopefully you’re convinced on why Postgres is a great tool, if not take a look back at my previous post.

Feedback for Conference Organizers

Fri, 04 May 2012 12:55:56 -0800

First a huge thanks to all organizers of conferences, but especially for those that organize not-for-profit conferences. I do understand its a great amount of work, and in nearly all cases have greatly appreciated the experience made available by the work they put into it.

As for some guidance. I’ve been on nearly all sides of the conference with the exception of organizing, so again organizers please don’t take offense to the feedback.

First on timeline

Please publish this early on your site and on lanyrd

Deadline for CFP
Deadline for Call for Sponsorship
Publish speaker list
Early bird registration ends
Regular registration ends

Early dates and times

As a sponsor please give very early notice of your conference times. As an attendee I can determine within a couple months or even sometimes a few weeks if I’m able to make it to the conference. As a sponsor unfortunately I’m more restricted by budgets and timelines.

Give me options on sponsorship

A prospectus is great, and often times I’m completely happy with it. Other times there’s things I may want to sponsor that are not on the list. If videos aren’t already being recorded I’d love to see the content live on, and this is an immediate place that jumps out as valuable to sponsor. Often times flexibility doesn’t cause me to sponsor or not, but it does leave a clear reminder of my experience.

Location of exhibit hall

Sometimes the exhibit hall is hidden away and only visited by attendees that really seek it out. Obviously this is less than ideal for a sponsor with a booth. The obvious solution is a central expo hall, but many conferences can go one further and put lunch and the expo hall in the same area. Having foot traffic near and through the expo hall gives slightly more exposure, letting sponsors get a better value.

As a speaker

Have previous years talks available

If you are interested in attracting new speakers to your conference, please include last years talks. As a speaker there’s often a new conference I’d be interested in attending, if I’d not attended or spoken I may not know appropriate context. Keeping last years speaker list and talk topics helps me elect whether it might be a fit.

Publish the CFP

This is especially key to have on lanyrd, above all the other timelines. There’s currently no better place for me to look today to get an idea of when CFPs are coming up. As a tip for others looking to submit talks to conferences check out their list of upcoming calls.

Bonus if you give me a signup form to get notified via email when the CFP is open

Turnaround

I understand there’s a lot to do when organizing a conference. As much as possible keep the turnaround fast on these. There’s a lot of effort involved in reviewing talk submissions, so I understand that its not a 1 day activity. However, far too often I’ve reviewed talks for conferences and given feedback, and seen most of the activity occur in a mad sprint at the end of the planned time. If this is going to occur at least compress that time.

Speaker dinner

As a speaker, I’ll be missing a few talks as I prep for mine and of course give mine. Additionally there’s a lot in common with other speakers at the conference often. A good speakers dinner with an opportunity to connect with them can be what sets the conference apart for me. Of course good food and drinks always helps this, but most importantly it does a good job of bringing all the speakers together.

Notice

If your CFP is well in advance of the conference, advertise it early and often.

As an attendee

Power and Internet

Power any and everywhere. Internet that works. I understand its hard, but everyone remember when it works well, so its one straight forward way to make your conference stand out.

If you really want to deliver a great experience, have a charging valet, let attendees drop off a device and pick it up an hour later.

Hallway Tracks

Talks are awesome, but give opportunity to connect with everyone there. Many of the people at a conference I see once at year at that particular conference. If my choice is between a talk and catching up with an old friend, the old friend may win out. Give me the opportunity to do both.

Evening events

I don’t want to get into the debate about drinking (coffee or alcohol), but evening events encourage socialization. Adobe’s MAX conference was a great example of this, while there was beer, I never saw over the top drinking at the after conference event. To keep it fun there were Xboxes, PS3s, Segway obstacle courses and much more. If this isn’t something the conference wants to condone or organize itself, there’s likely an event happening somewhere, help with the publicity of those. With any luck the non-conference evening events are well done, and the conference spirit continues into the evening.

Why Postgres

Mon, 30 Apr 2012 12:55:56 -0800

UPDATE: A part 2 has been posted on Why Use Postgres

Very often recently I find myself explaining why Postgres is so great. In an effort to save myself a bit of time in repeating this, I though it best to consolidate why Postgres is so great and dispel some of the historical arguments against it.

Replication

For some time the biggest argument for MySQL over Postgres was the lack of a good replication story for Postgres. With the release of 8.4 Postgres’s story around replication quickly became much better.

While replication is indeed very important, are users actually setting up replication each time with MySQL or is it to only have the option later?

Window functions

This is a feature those familiar with Oracle greatly missed in Postgres. In fact even SQL Server had some form of them, though it was with T-SQL, which adds a bit more complexity to the feature. This is a feature that once you have you can’t live without; the other options that existed before were slower and much more complicated. With the release of 8.4 window functions were added to bring Postgres on par with Oracle in this area. For more info on using them check the Postgres docs above or PostgresGuide.com.

Flexible Datatypes

Creating a custom column is simpler in Postgres than any other database I’ve used by far. Excluding custom datatypes, even Postgres’s out of the box datatypes make Postgres stand out far ahead of other databases. In particular the ability to create a column as an Array of some datatype. Want to store a game of tic-tac-toe in a database, or an array of 1 user’s phone numbers? It simply becomes a single column that can contain multiple phone numbers for a user.

Functions

Need to do some logic outside of standard SQL? Postgres likely has a function already available to do it for you. What about the time you wanted to take all rows returned by a query and combine them into a function? Give array_agg a look. Need to split a string and grab a part of it or some other string action, there’s a function for that.

Custom Languages

Want to use another language inside your database? Postgres probably supports it:

Extensions

Need to go beyond standard Postgres? There’s a good chance that someone else has, and that there’s already an extension for it. Extensions take Postgres further with things such as Geospatial support, JSON data types, Key Value Stores, and connecting to external data sources (Oracle, MySQL, Redis). I could easily have a full post on extensions available alone, fortunately someone else has already created an awesome one - PostgreSQL Most Useful Extensions.

NoSQL gives flexibility

I don’t want to get too NoSQL versus SQL debate…. no matter which side you fall on you can get both in Postgres. With hstore and PLV8 you’ll get the flexibility in your data that you would with Mongo along with all of the above features. Will Leinweber has a talk that he’s given at several conferences recently that highlights Schemaless SQL.

Custom Functions

Didn’t find the function you wanted in the above? Try creating it yourself:

 CREATE FUNCTION awesomeness(varchar) RETURNS boolean
 AS 'CASE WHEN $1 == \'postgres\' THEN TRUE ELSE FALSE END;'
 LANGUAGE SQL
 IMMUTABLE
 RETURNS NULL ON NULL INPUT;

Common Table Expressions

Often times when exploring data or creating a new view you’ll want to load data into a temporary table. When exploring data you only need this for a temporary time. Why actually go through the effort of putting it into a temporary table, especially if you only need it for a single query. Common Table Expressions let you accomplish just that.

Development Pace

For some period of time MySQL and Postgres were both moving at fast paces. In recent years though Postgres has rapidly picked up its pace of how much gets packed into a single release. Just take a look at the Major Releases.

Conclusion

Hopefully you’re convinced on why Postgres is a great tool. Next take a visit to PostgresGuide if you need some direction on where to start or how to use many of the above features.

Apps to Services

Fri, 13 Apr 2012 12:55:56 -0800

Update the talk for this is now viewable on YouTube here

When I first came across Django I was an immediate fan. It featured:

Good documentation
Steady but stable progress
Community around apps which encouraged DRY

I’ve been a user off and on depending on my needs for nearly four years since discovering it, and throughout that time all of the above have remained true. However, as I’ve worked on and encountered more complex applications there’s one thing that has time and again broke down for me, which is the Django apps model. It hasn’t broken down due to Django only though, I’ve seen it break down in Ruby (Rails), Java, .Net, take you’re pick of language or framework.

The breakdown of this model is due to several things:

Successful applications grow which mean more complex applications and more developers
More complex applications often mean larger code bases
- Deprecating code is good, but not always easy in large code bases
- More code means more testing, but slower releases

At Heroku one way we often describe the platform to others is “A distributed Unix in the cloud.” There may be many reasons for this, but one of which is that we love the Unix approach and philosophy of Small sharp tools. Sticking to that, many of our internal pieces are small individual apps that talk across defined contracts or APIs.

Back to Django’s app structure… Many people build apps and re-use them and often share them with the world. This is truly great for re-usability, which means you can focus on building key features. However, this does not enable your application to be more maintainable in the future nor does it enable scalability. Yes, you can absolutely scale a monolithic application, but it doesn’t mean you should. This doesn’t mean the app structure is entirely broken, it just means that it is a partial step to where you should be. The real solution is to build more of these pieces of your greater application as services.

A Django app is defined as A web application that does something. I.e. Weblog, Poll, Ticket system. Within Django an app contains:

Models
Views
URLs

I couldn’t find a great definition of a Service that was succinct and also said something of value (If you have one please pass along as I’d love to have a definition from a source other than myself). For the sake setting something in place I’m defining a service as Method of communication over the web with a provider using a defined contract. By this definition a service contains:

Provider
Endpoint
Contract

Let me clarify this a bit further…

Tangible example/parable:

Django Apps::

Ticket
FAQ

Company Teams::

Support
Community Evangelist

You start with two apps, that maybe share a little code. Moreover they at least exist in a central code base. Then you deploy something and the Ticket app can no longer create FAQ, due to a change in one or the other. There’s no finger to point, but more importantly, you don’t know how to contact to resolve. Neither team wants to deploy, so you test more. Before every deploy you run tests… and validate a build… and deployment slows… well maybe not with two teams. But as you get to 5 teams it does, and more so with 15, and more so with 30 teams. Then you hire a build master and release master, who really wants that?

So within Django maybe you go from apps all in the same codebase to releasing private versions of apps…

Your requirements.txt for a main site looks like:


FAQ==0.2

You have 3 apps which depend on it, support, marketing, billing. You bump a version FAQ==0.3 but then all three or no teams have to upgrade the version to the new APIs. However if your interface was:


data = {
‘question’: “my question”,
‘source’: 123
}
requests.POST(os.environ[‘FAQ_API’] + ‘/v1/create’, data=data)

You could also have:


data = {
‘question’: “my question”,
‘source’: 123,
‘related’: [456, 789]
}
requests.POST(os.environ[‘FAQ_API’] + ‘/v2/create’, data=data)

Then you can easily support both, deprecate v1, and track its usage easily. This doesn’t guarantee, but it does enable re-usability, scalability, maintainability. And of course continues to let you build features instead of maintaining software.

In the next post I’ll go into a bit more detail of how a real example looks with apps in both forms, using a set of Django Apps and using a set of Services built on Django Apps.

Slides from a corresponding talk at DjangoCong are here

Sphinx Build Pack on Heroku

Wed, 25 Jan 2012 12:55:56 -0800

Heroku’s latest Cedar stack supports running anything. Heroku’s officially supported languages actually have their buildpacks public via Heroku’s github, you can view several of them at:

There have even been some created as fun weekend hacks such as the NES Rom Buildpack.

Recently at Heroku my teams have started exploring new forms of collaborating and documenting. In particular editing a wiki for communication is contrary to our regular workflow. Much of our day is spent in code and git. To edit a wiki within a web browser and using some markup we’re less familiar with is an overhead we were aiming to reduce. As a result we’ve tried a few things, the first was simply using a github repo to edit markdown.

Personally I have always been a fan of Sphinx documentation. However, Sphinx has no means to secure a site out of the box. Generating the static site then running it being a Rack app to secure it seemed like a few extra steps that would hinder workflow. As a result I set out to build the Sphinx buildpack which would let you push a Sphinx project to Heroku and automatically run your documentation. The buildpack itself supports two modes, public documentation and a private documentation. To have your documentation secured in private mode you simple need to add your google apps domain as a config var heroku config:add DOMAIN=mydomain.com.

If you need more information about setting up OpenID check out my recent post Securing your organization with OpenID

 $ sphinx-quickstart
 $ git init .
 $ git add .
 $ git commit -m initial
 $ heroku create -s cedar -b http://github.com/craigkerstiens/heroku-buildpack-sphinx.git
 $ git push heroku master
 $ heroku open

Securing your Internal Organization with OpenID

Mon, 23 Jan 2012 12:55:56 -0800

I’ve recently been amazed at the number of companies that are still using a VPN or other means to manage their apps/network. Not just large enterprisey companies, but small agile startups. I fully understand that it works, but 95% of these places are also using another key tool for access inside their company… Google Apps. I fully expect companies to use google apps, its more of the former that surprises me most. For a long time OpenID wasn’t at a usable point, even today it still isn’t without its faults. However, it does make for a much cleaner workflow once in place than having your users login to something with they’re used to using elsewhere.

In our personal lives we use email as our keys to the kingdom. In fact I now almost refuse to sign up for any service that doesn’t let me use oauth, so why should a work place be much different. So I inquired with a few companies to see if they were fine with securing things like documentation or wiki’s being google auth, they indeed were. Yet they still seem to have users keep one more username and password for their VPN to be able to login to access internal docs/tools.

Most tech centric companies grow their own apps for many things they do within a company. Even the heavier adopters of SaaS still end up building a lot of internal systems. So why not secure them with your email domain just as you commonly would if it were a public service?

The problem comes in that OpenId with google has an initial setup overhead, but after that works unbelievably well.

The catch

In some cases you currently have to identify your domain as an OpenId provider. This means that @yourname.com is an OpenId provider. This simply means creating a url route for openid in your base site similar to the below:

<?xml version="1.0" encoding="UTF-8"?>
<xrds:XRDS xmlns:xrds="xri://$xrds" xmlns="xri://$xrd*($v*2.0)">
 <XRD>
 <Service priority="0">
 <Type>http://specs.openid.net/auth/2.0/signon</Type>
 <URI>https://www.google.com/a/craigkerstiens.com/o8/ud?be=o8</URI>
 </Service>
 </XRD>
</xrds:XRDS>

This is due to an issue of OpenID discovery which you can read more on at: https://groups.google.com/group/google-federated-login-api/browse_thread/thread/4a7dd2312a47a082/9285cec18a30b9d3?lnk=gst&q=apps+discovery&pli=1#9285cec18a30b9d3. In short, setting up the above can save you a lot of time

Setting up in apps

Most web frameworks have libraries that make it easy to secure your apps with openid/oauth. In particular Django and Rails both make this pretty easy. To make this even simpler for you below is code to actually secure an internal app for both Django and Rails. You can do similar with Flask or Sinatra as well.

Rails

In case your admin controller isn’t already generated:

rails g controller admin/users

Then anything you want to secure:

module Admin
class UsersController < ApplicationController
before_filter :admin_required
def index
render :text => 'Hello from the admin panel!'
end
end
end

Django

Finally sync your database:

python yourapp/manage.py syncdb

Secure any view with the login_required decorator as your typically would with Django.

Summary

In short with some very basic app setup you can have an internal workflow thats just as good as what you use in your day to day outside the office.

How Heroku Works - Hiring

Fri, 02 Dec 2011 12:55:56 -0800

I alluded in earlier posts of How Heroku Works that we have talented engineers. In fact I would venture to say that there is not a weak link when it comes to our engineers at Heroku. Ensuring we have talented engineers makes it easier for us to find other talented engineers and maintains a level of quality in our product. This means we must be very careful about not diluting our pool of engineering talent, which is where our hiring process becomes especially key. By the time we hire a new employee, we know without a doubt they’re a fit within our organization.

Our goal in hiring is seldom to fill a role, but more commonly to find more talented people share our goal (changing the world for developers).

So what’s our hiring process look like….

Review Resume/Github Profile
Initial Screen
Second Screen
Starter Project

While there’s definitely a process that we follow that’s not what’s interesting. We way too often get worried about the steps 1, 2, 3… Instead you should focus on what’s important: are they a fit? Can they get shit done? Who cares about how many phone screens someone goes through?! Five phone screens instead of two doesn’t make them a better fit for your company. The short of it is they go through enough screens that you feel comfortable and you progress them through the process. For us at any point in the process if someone is determined to not be a fit the process ends there. If the process does end the hiring manager will relay this in the appropriate form, though always in writing via email as well.

The hiring manager could debatably be the biggest difference between our process and others. When a candidate applies to a position it goes to the hiring manager (not an HR person). The hiring manager will be your manager once at Heroku it’s one in the same and this ensures from the start of the process the candidate and the manager mesh well. Yes, having the manager of a group review github profiles and resumes is extra effort, but who better to judge from a quick glance than engineers. In general as a manager you’re evaluated on the success of those you manage, as such you should be invested heavily in those you hire. In addition to this we find a big difference in the on boarding process and how quickly someone can succeed. We have used many approaches, but the success of someone at Heroku based on having their hiring manager and manager be the same individual is best highlighted below:

While every step in the hiring process is valuable starter projects may be the most valuable to ensuring quality. The final step with nearly everyone we hire is to invite them to come hack with us. Instead of parading someone around for a day long interview we get down to business and write some code. It could be something internal to Heroku, it could be an open source project we use, it could be something interesting that the candidate feels would add value to Heroku. Starter Projects vary slightly between each hiring manager.

Several of our managers prefer to lay out several potential interesting projects, talk through them with the candidate, and then let the candidate decide what they’d like to work on. Sometimes there’s a pressing need that the candidate can jump right in and add some value. It’s always important that the starter project is achievable, if it’s too broad of difficult for a 1-2 day period then the manager has failed in the hiring process. Regardless of the project it’s far more than an exercise on a white board, it’s actually what life is like at Heroku. We have lunch at the same table that we eat at every other day, we interact just as we normally would, and after work there may or may not be drinks just like any other day.

As a slight aside, we even conduct starter projects when current Herokai move from one team to another

Starter projects usually last anywhere from a day to several days. At the end of a starter project the candidate presents what they did, in a similar fashion to weekly demos that occur at workshop (more on that some other time). In earlier days it was nearly the entire company that would sit in, ask questions and give feedback. Now it’s a bit harder for all us to fit into one conference room, though there’s an open invite and anyone that wishes can sit in (often 10-20 Herokai). At the end of the starter project there’s no question that the candidate fits or doesn’t, often from both sides. Of course if it’s a fit we make an offer and welcome them into the family.

Getting Started with Django

Sat, 12 Nov 2011 19:55:56 -0800

For those completely new to web development, Django is a web framework that makes it easier to build web applications with Python. For those that have some knowledge of other web frameworks and Django you may be able to fly through much of the following. Django is a slight modification on the MVC construct which views itself as a MVT Model, View, Template. Django views a website as a project and within it smaller apps are contained.

Earlier we installed Django into your virtual environment. If your environment is loaded we can get started with a Django project. First lets create the project:

django-admin.py startproject myproject

This should have created a directory for your project called myproject. Within the myproject folder you’ll find some core files to every Django project.

$ ls
myproject venv
$ ls myproject
manage.py myproject
$ ls myproject/myproject
__init__.py settings.py urls.py

Let’s examine a little of each of the files and what they’re used for:

manage.py - A management utility for interacting with your Django project. In addition to the default commands available which you see by running python manage.py, you can create custom commands (we’ll get into that much later). settings.py - This is the settings file for your application. Here you’ll put various configuration and load things such as your database connection. urls.py - This is the place to setup how your urls will work. You’ll define a path for the url and then which code is to be executed when you visit that url.

Postgres... The death of NoSQL

Tue, 08 Nov 2011 12:55:56 -0800

NoSQL has long been a trend that many have talked about. While there’s a place for various key-value stores and tools such as memcache and redis, this will address most specifically how NoSQL is attempting to replace a traditional database. I’ve long been a fan of postgres and in general traditional relational databases. In a broad sense traditional databases offer multiple things.

RDMS

Data guarantees

The current major SQL databases (SQL Server, Postgres, MySQL, Oracle) offer guarantees around your data that doesn’t always exist with other systems. At a very high level this means when they say they have the data there’s not a chance they’ll loose it. When using something as a primary datastore this is always my first requirement. Data is a valuable commodity so keeping it around is obviously important. There are cases where exceptions exist (reporting applications are common here). The specific thing I always look for is that a system upholds the ACID properties. For a quick breakdown of these:

A is for atomic. In short it means no transaction can be partially completed, its all or nothing.
C is for consistent. This means you go from one consistent state to another. Meaning things like cascades and constraints are upheld and can’t be ignored for a period of time.
I is for isolation. This means transactions don’t get to interfere with each other.
D is for durability. This means once the transactions there its not going anywhere.

These basic principles make me feel pretty content with my data being safe. This doesn’t include things like backups and replication, but rather is a baseline for me feeling safe with a system.

Here’s a hint, many NoSQL solutions don’t enforce these which is where they get speed from

Consistent means for accessing data (SQL)

Many people complain about SQL and while its not a perfect language it is a common standard for accessing data. There are idioms that exist in Oracle that do not in Postgres and ones that exist in MySQL that do not in SQL Server, but on the whole ANSI-SQL is a common standard. This means if you learn one in large part you learn another (from an application developers perspective). This means you have a broader people to pull from when you consider moving from one to another, and that skills are more portable. While Mongo may be growing, its in no way guaranteed to be around in 5 years, nor is CouchDB. In fact there have been many NoSQL databases that have come and gone:

Tokyo Cabinent

Postgres

So there’s some shared things that make databases great, but in particular Postgres aims to be the single database capable of ushering in a death to NoSQL. While each item could easily be its own blog post hopefully the following calls out they key values and allows people to dive in deeper.

HStore

I strongly debated saving the best for last, but really just couldn’t wait. If there’s a single feature in Postgres that will kill NoSQL its HStore. HStore is a data-type that allows you to store a dictionary within postgres.

Custom datatypes

PostGIS

Location is all the buzz these days and more and more applications have some tough of location involved in them.

How Heroku Works - Maker's Day

Mon, 07 Nov 2011 12:55:56 -0800

In my earlier post on Teams and Tools at Heroku, I mentioned how we value engineers’ time; their work has enabled us to build a great platform. As a result of what we’ve built, we’ve had great growth both of our platform and of our teams internally. With that growth inevitably comes different distractions on engineers’ time. Despite how a manager may plan things, engineering work needs long periods of uninterrupted time. To ensure that no matter what, an engineer has plenty of opportunity to do the work he or she was hired to do, Heroku has Maker’s Day.

Maker’s Day ensures that engineers get a full day of uninterrupted time to focus on making things.

The more consistent interruptions are throughout an engineer’s day, the more time will be lost due to context switching in addition to the time spent on those other activities. These interruptions may include a quick question from a manager, a question on a code problem someone else is working through, or an email or IM from a coworker. Regardless of the type of interruption, it causes an engineer to lose focus. According to Peopleware: Productive Projects and Teams, in a study regarding productivity among engineers, the top performers when surveyed said they were interrupted regularly 38% of the time versus the bottom performers, who were interrupted 76%. Context switching should be counted as fully wasted time for an engineer, and all too often as the number of meetings increases, the time involved with context switching is increased similarly to the following:

For more on how interruptions or context switching decreases productivity, Jeff Atwood has a great post about The Multi-Tasking Myth, which demonstrates …

Most people understand that context switching is bad, but another team may still have valid demands on your time. Pushing back against another team or manager isn’t always feasible; after all, we do work together, and each team at times may need something from another team. This is where Maker’s Day starts to come in. Every Thursday at Heroku is Maker’s Day.

Maker’s day is meant for making shit. Meetings don’t happen on Maker’s Day. If someone asks if that time on your calendar works for a meeting, the simple response is no–it’s Maker’s Day. Because Maker’s Day has been ingrained into our culture, engineers have no problem giving that response when there’s a request on their time on Maker’s Day. If someone in marketing, sales, or another non-engineering role wants to book meetings, they’re welcome to do so, but they’re going to be without engineers. However, even for non-engineers, Maker’s Day is equally invaluable; uninterrupted hours of focus at a time are amazing for productivity in any role.

Maker’s Day varies in how it is executed from person to person. Often the office is slightly less busy due to some engineers working from home or coffee shops to maximize their productivity. To an outsider, the office may appear business as usual: engineers sit at their desks, working. At lunch, everyone is sitting around the lunch table eating together. To the unobservant eye it may appear to be just any other day, but the engineers notice the difference. There will be significantly less interruptions by someone walking over to your desk, you won’t be pulled into meetings that distract you from features, and you know it’s an opportunity to accomplish a bulk of work laid out from your weekly planning meeting.

As Heroku has grown, meetings have increased, and the value of Maker Day’s has increased exponentially.

Whether you’re in the early stages of bootstrapping a company or at a large company of thousands of engineers, one of the best practices anyone can put into place is dedicated quality time for engineers to produce code. Maker’s Day is a fantastic way to ensure this happens on a weekly basis.

How Heroku Works - Teams and Tools

Wed, 02 Nov 2011 12:55:56 -0800

Heroku is a largely agile company, we work in primarily small teams that talk via api and data contracts. Its also a company comprised primarily of engineers, even product managers often write code. Heroku as a platform drives many of the features not from top down, but from bottom up based on engineers desires or skunkworks projects. There’s many valuable insights into how Heroku runs efficiently for engineering.

I’ll be diving into many various practices that enable Heroku to put quality engineering above all else, but first let me highlight the team structure and tools that enable this.

Heroku is comprised of many small teams internally, each team operates much like an individual entity. The team chooses its own tools and best method for communication, though as a whole some form of Scrum is run throughout teams. Think of the unix philosophy of small sharp tools as in The Art of Unix Programming applied to teams and people.

For most teams this involves a weekly planning meeting earlier in the week. In such a meeting teams may conduct a retrospective, opportunities to improve the process the coming week, and of course plan tasks for the coming week. Its very important to note that planning tasks for the week doesn’t necessarily involve planning the deadline for them, but rather simply laying out what people are working on (more on this in a future post). Each team will record and track this in a tool of their own choosing. Several use pivotal tracker, one uses scrumy, some use email to distribute and track against personal to do lists. The method for tracking issues is again entirely up to the individual team. A one person team may choose to use a simple to do list, larger teams commonly use github issues and pull requests. Given the team is the one responsible for their own productivity the team is the one to choose what tools they use.

Meeting loads vary from person to person depending on what is the demands are on their time, though everyone at Heroku participates in some form of standup. Most teams do these daily as quick status stand-ups of what was worked on the day before and whats to be worked on the next day. In addition to the planning meeting and stand-ups, there is often collaborative engineering, and company wide gatherings.

Collaborative engineering once again varies depending on which engineers are working together. Engineers may get in front of a white board or in front of machines and simple collaborate. For engineers together in the office this is often the most productive way. These practices work the same for remote employees, though slightly modified for the high touch interaction. For remote employees this often works as pair programming via Skype. Skype is indispensable for allowing remote workers to feel far less remote. Skype alongside typewith.me and you have an unbelievable easy to collaborate not just with 1 other, but with multiple parties to work through a document on a given topic. For smaller activities of communication asynchronous is key. This ranges from campfire most commonly during common working hours when someone is likely to be at a machine, to email when the return on a request may take slightly longer.

Finally there is the all common company wide meeting, which occurs weekly. The structure of this varies from status updates to broader ongoings. Its often the perfect time for engineers to hear about what sales is doing or get updates on teams you don’t commonly interact with. Along with common status updates there will be broader company updates.

Consistently across all teams you’ll find these principles which allow us to ensure the quality of engineering as we continue to grow:

Small teams that talk across defined API’s and data contracts
Teams using the tool that they believe is best for the job
Frequent asynchronous communication
Collaboration (including for remote employees)

The key in Heroku running efficiently is primarily allowing each team to run as it chooses. Heroku works because we have talented engineers, the best thing we can do for those engineers is allow them to work productively. Often only they know the best way to accomplish this, so who better to let them accomplish it than themselves.

Installing Python Packages

Tue, 01 Nov 2011 12:55:56 -0800

Now that you have you system and project environment all setup you probably want to start developing. But you likely don’t want to start writing an entire project fully from scratch, as you dive in you’ll quickly realize theres many tools helping you build projects and sites faster. For example making a request to a website there’s Requests, for handling processing images there’s Python Imaging Library, or for a full framework to help you in building a site there’s Django. With all of these there’s one simple and common way to install them. But first a little more on how it all works.

All major Python packages are hosted on PyPi (Pronounced Pi-P or Cheeseshop). When you use a common python installer it will:

Search for the package you specify
If you specify a version will use it, otherwise will use the latest
Will download the source for that package
Install it into your Python environment

Now for actually installing… Lets get started with installing the three packages below. At this point you should at least have a fresh Python environment, however you don’t have an immediate way to install packages. The defacto Python package installer is pip.

Earlier we setup virtualenv to help isolate our python packages we were working with. First lets go ahead and create a folder for our project then setup a new environment for the project we’ll work on:

$ mkdir myapp
$ cd myapp
$ virtualenv --no-site-packages venv

If we list the contents of the directory you’ll now see a folder venv. Within this folder you’ll find all the parts of the environment that virtualenv just created:

$ ls
venv
$ ls venv
bin include lib

Now you’ve got a sandboxed environment that exists but you haven’t loaded it. You can now activate and deactivate this any time you like. Once you do this it customizes your path to use the packages you’ve installed for this environment. To load your environment when in the myapp directory:

$ source venv/bin/activate

To deactivate this simple:

$ deactivate

Now that we’ve got your environment loaded installing your packages should be pretty simple. Ensure that you have your virtualenv loaded and then run:

$ pip install requests
$ pip install PIL
$ pip install Django

Now that you’ve installed your packages you want to be able to share this with others to make it easy to get setup. You could provide a list of everything your application needs to run manually, or because its Python you can expect it to make it easy for you. Pip has a wonderful command freeze that will show all of your packages and their versions that are installed. Simply run:

$ pip freeze

However, this only outputs the information. Along with this pip has a canonical form for listing requirements and installing them from a file. The filename is commonly a requirements.txt. To create this we simply pipe the results of pip freeze to this file.

$ pip freeze > requirements.txt

Next we’ll talk about a few more advanced items in dependency management, then finally we’ll get started on building an application.

Getting Setup with Python

Thu, 27 Oct 2011 19:55:56 -0800

This is the first of a multipart series to getting started with Python. Throughout this guide we’ll walk you through a full setup. For starters if you’re a mac or linux user you already have Python on your system. You should be able to confirm you have python my opening up a terminal window and running:

$ python --version
Python 2.7.2

As long as you see a Python version 2.5.x-2.7.x you should be fine to continue. From here we’re going to work through setting up your Python project environment. For this we’re going to use virtualenv. For those of you not familiar virtualenv is a self-contained python environment. It holds its own copy of python and any libraries you install. This allows you to work on multiple projects with different versions of libraries.

While we’re installing virtualenv we’re also going to go ahead and setup PostgreSQL as we’ll be using it later. If you’re on a mac you’ll first need to setup homebrew. Homebrew is used for installing various system packages. If you’re on linux, in particular Ubuntu you can skip down to the steps for setting up your environment.

First for Mac users lets setup homebrew which will allow us to install various system packages:

$ /usr/bin/ruby -e "$(curl -fsSL https://raw.github.com/gist/323731)"

$ curl -O http://pypi.python.org/packages/2.7/s/setuptools/setuptools-0.6c11-py2.7.egg | sh setuptools-0.6c11-py2.7.egg
$ easy_install virtualenv
$ brew install postgresql

$ sudo apt-get install virtualenv
$ sudo apt-get install postgresql

Now you have Python, virtualenv, and postgresql all installed. We can now focus on setting up the initial start to a project.

In the next part we can start with installing some Python packages.

Environment Structure for Django Apps

Mon, 16 May 2011 19:55:56 -0800

I’ve been writing applications off and on for nearly 4 years now, since before Django 1.0 was even released. I must say the framework could not be better described than by its own tagline “The Web framework for perfectionists with deadlines”. Among the things I love about it are:

However, after using the framework for nearly 4 years I’m just now discovering my preferred way of managing environments. I know there’s still a bit of back and forth on development environment/IDE, but as far as configuring actual project environment I’ve become very comfortable with what I’ve now been using for many months. It also allows for someone else bootstrapping their environment incredible quickly as well. Below is a quick cookbook of how to do this on OSX and Ubuntu.

$ sudo port install python27
$ sudo port install py27-virtualenv
$ sudo port install postgresql90

Now for setting up your project:

$ mkdir example
$ cd example
$ virtualenv-2.7 --no-site-packages .
$ source bin/activate

$ sudo brew install python
$ sudo brew install virtualenv
$ sudo brew install postgresql

Now for setting up your project:

$ mkdir example
$ cd example
$ virtualenv --no-site-packages .
$ source bin/activate

$ sudo aptitude install python
sudo aptitude install virtualenv
sudo aptitude install postgresql

Now for setting up your project:

$ mkdir example
$ cd example
$ virtualenv --no-site-packages .
$ source bin/activate

For those of you not familiar virtualenv is a self-contained python environment. It holds its own copy of python and any libraries you install. Now that you’ve setup your virtualenv we can go through the process of installing django and setting up your repository. This is the same across all of the above platforms:

Add to a .gitignore file:

bin
build
include
lib
.Python
*.pyc

Add to a requirements.txt file:

Django==1.3.1
psycopg2==2.4.1

Then run:

$ bin/pip install -r requirements.txt

This should fully installed any of your required apps and makes it easy for others to do the same to begin contributing to a larger app. Finally if you like you can create your git repo from this an make your first commit:

$ git init
$ git add .
$ git commit -m 'my first django virtualenv'

Attribution 101

Fri, 18 Mar 2011 19:55:56 -0800

Continuing with the recent posts on metrics and marketing. I want to give a quick primer on attribution. To any marketing or analytics people out there, simply skip this it would aim to be a primer recap at best for you.

The very general meaning behind attribution is to give credit. When it comes to web products this can be giving credit for lots of things:

Its really wonderful when there’s a direct mapping in correlation. Take for example the case where:

Disclaimer: I have little to no formal training in stats or analytics, have simply learned through launching products so take this for what its worth, someone that has been there and done it.

Startup/Bootstrapped Marketing Recap

Mon, 07 Mar 2011 19:55:56 -0800

If you have an hour to spare its well worth it to look back and look back at my series on startup/bootstrapped marketing. But if you’re short on time and want the high level summary here’s the quick recap:

Part 1 - Focus on SEO

Reading Metrics to Evaluate Marketing

Wed, 02 Mar 2011 19:55:56 -0800

A short while backed I talked about tactically measuring metrics for your site/company. Recently I talked a bit about methods of marketing. A large key to getting the most out of your time and money is to properly report against the intersection of these two items. First I’m going to make the assumption you’ve read those posts, if you haven’t go back and do that. Next this is heavily on the assumption that you’re using Google Analytics as your primary tool for measuring metrics and have setup goals appropriately.

Within measuring you’re metrics you’ll have abandonment at each level. You may have some visitors that never register, and many that register but never purchase anything. It’s wonderful if you’re able to immediately have full insight of the best means of marketing to drive revenue, however realistically it occurs in a more phased approach. The first step is to drive visitors and almost immediate second is to convert those users as registered.

Acquisition -> Activation

This at a high level makes sense, as does at a high level knowing you should be targeting users in your target market. However, even slightly drilling into this you realize that all traffic is not equal. It’s often known that CPC and CPA advertising does not convert well for users, though can drive traffic. This may not always be the case. For Registry Stop to make this analysis easier we’ve created a custom report in google.

To create your own custom report simply click Manage Custom Reports under the Custom Reports area then “Create new custom report”. The custom report ability in google gives you much more ability to drill into the data that you already have at a higher level. To track effectiveness of converting visitors to registered users and which sources are effective at this you’d create something that looks like:

Here while very simple we’re able to see some very key information quickly. Here’s an example of how it would appear over a few day period:

As you progress in your stage of bootstrapping or growing your startup you’ll want to grow custom reports that allow you to report against as many metrics as possible. The first and last part of my day is spent pouring over these reports. Having this data readily available allows us to drive our business based on data. Perhaps the hardest part of all of this is admitting when the data is counter to what we expect and following its advice.

Setting up Goals and Funnels - Google Analytics

Mon, 28 Feb 2011 19:55:56 -0800

I had a recent request on how to setup a funnel in Google Analytics. If you’ve missed by first post on some tips for Google Analytics first check that out. With most websites today there is some portion of the site that is event and not page based, meaning you have some workflow on the page based on Javascript. If this is the case you’ll want to fake a page view instead of an event in order to entirely use it in funnels and goals.

A personal recommendation is actually to use both, goals and funnels. The key a funnel is that you need to have successive steps that occur in some order. With regards to metrics tracking this is absolutely needed, but typically you may have 1-2 total funnels with many steps in your site versus goals where you could have 10-15 single goals. For Registry Stop we’ve structured our site so that our earlier stage goals become the same as steps in later stage funnels. For us in almost all cases the first part of the funnel is the visit, the second is registering for an account. We do have independent goals for visits and registrations as well, but we do not have funnels on those goals.

A key to getting the most use out of your funnels is to know that there is a workflow to follow to getting to that end goal. To highlight this slightly more visually let me walk through an example:

The parts of your funnel will be:

Once here it becomes a bit more intuitive. You can begin by simply adding a goal. Your goal types should seem mostly intuitive, as mentioned in an earlier post you cannot use an Event in a goal. For this reason you can fake a pageview as if it actually occurred and then create your goal against that non-existent page view. If you want a few more details of how to do this check out the previous post . So for an example, we have a goal on Registry Stop that detects when a page view occurs as a result of a registry being synced. Because this workflow is heavily javascript and flow based we fake the page view and track it as if the page was actually visited. The goal itself looks like:

For setting up your funnel as we mentioned above its generally a set of page views. A very key item is the check box next to the first item in your funnel. If you check this is means any other steps in the funnel are not counted unless you the first step is completed. If you have a very structured 1, 2, 3 workflow this makes sense. However, if there are various ways for the goal to complete then be very careful about selecting this.

For this same goal above we have a corresponding funnel to track in detail how our conversion flows. The funnel itself looks like this once setup:

This results in a funnel report that looks like:

Evaluating Paying for a Blog Post

Fri, 25 Feb 2011 19:55:56 -0800

At a recent meetup I talked a bit about how I’d been using blog posts on other blogs, both free and paid for as a primary user acquisition tool. I was very shocked, when several were surprised and curious on the method for this. In tech startups coverage is common, but its usually just that press, not paid for press. I must say I love how the tech community doesn’t force people to pay to get the word out, but it is very much a competition; that might be just as much work as paying.

In contrast the wedding industry is very much a pay to play space. If you give some money you can get some attention.

First things first, contact the blog you’re interested in being written up in and ask for their media kit. If they welcome sponsored posts, then it is likely called out in their media kit. However, this isn’t always the case, if you’ve noticed posts on their blog that have been sponsored posts but pricing isn’t called out in their blog then email them explicitly and inquire.

Once you’ve got their media kit its time to do some digging. My process has first been to validate their numbers. Most blogs include their unique visitors and page views in their media kits. I immediately jump over to compete to check if their numbers are even in the ballpark. To clarify in the ballpark can be somewhere around 1.5x of compete. There are a surprising number of blogs that may be 20x off in the numbers they are stating. This could mean you’re outright lying on your stats, or that you’re not running a solid enough business that you know how to effectively track your numbers. It could be, because your blog exists on 5 different domains, while compete I’m checking only the primary. Regardless, if you’re numbers aren’t close, it often means you’re not as together as we’d like.

If they pass the first smoke screen of the stats being in the ballpark, then we can move on to evaluating sponsoring a post. Traffic’s a big factor, unique visitors are important as well as page views. Next we’ll typically look for how active your users are. Do users actively engage in comments, where there are active commenters there’s usually opportunity to get a bit more out of your post.

Next would be, how frequent are posts. Are you looking at 1-2 posts that go up per day, or 10. If 10, it simply means your content will be pushed to the bottom of the page pretty quickly, in this case if page views are exceptionally high, it may mean that users only view 1-2 posts per day and miss the others. I’ve historically done this on a subjective basis, but it could easily be a number that is calculated and factored in.

So you start with this basic methodology for one blog, then do it for a few more. Its pretty simple to compare 3-4 blogs on their potential value, but when you really start expanding this you could be looking at 100 blogs. If that’s the case it does help to have some structured method. We typically weight their unique visitors, page views, a factor of how accurate they are against compete, their commenter level, and finally their post frequency. We multiply that weight against the cost of a sponsored post and there you have your priority in terms of which blogs to begin advertising on. Its usually best to try 2-3 blogs to determine your return. But even one can give you an idea of results.

A quick recap of the basic formula:

JQuery and Django Autocomplete

Fri, 25 Feb 2011 19:55:56 -0800

In a couple of various places I’ve seen light requests of how to put autocomplete in for a Django web application. Here’s a really light weight version with a view and autocomplete functionality using:

from django.utils import simplejson
def autocompleteModel(request):
 search_qs = ModelName.objects.filter(name__startswith=request.REQUEST['search'])
 results = []
 for r in search_qs:
 results.append(r.name)
 resp = request.REQUEST['callback'] + '(' + simplejson.dumps(result) + ');'
 return HttpResponse(resp, content_type='application/json')

For the jQuery autocomplete and call:

Bootstrapped/Startup Marketing Part 4

Tue, 22 Feb 2011 19:55:56 -0800

We’ve talked some about SEO, media/blog posts, adwords, no one of these is a magic bullet. Some work better for different reasons. As I mentioned in the first post, if you haven’t checked out the post on tactically measuring metrics then please do. If you have followed those steps and explore each of these options, then you should have an idea of which one works well for you and which doesn’t. The final piece of marketing may be a bit harder to measure, but is going to do great things towards growing your brand to users and visitors.

Retargetting Retargetting is the idea of showing an ad to a user that has already visited your site. It’s pretty basic, someone comes to your site and you have a pixel that loads telling your ad network they’ve visited. From then on they may see your ad when randomly browsing the web. With my most recent venture into the online registry space I was browsing Chiacgo Tribune and Slashdot and came across our ads. There’s absolutely no contextual relevance there, but I checked and it was due to our retargetting. This will make it appear as if you are everywhere to your users. If you do want to heavily monitor what retargetting is doing for you, the best place to do it is around your retention metric.

How do you do retargetting? Sounds like a complicated process slightly… Well it’s simple you don’t, let one of the major ad networks do it for you. The first step for it, is to use image ads. Text based ads may work great for google and facebook, but on most of the websites your retargetting will run on you want to have images that the blogs typically serve. The 3 key form factors you’ll want are:

If you want your shop to seem like a mom and pop site that has 100 visitors a month, then retargetting will hold absolutely no value for you. But even if you do only have 100 visitors a month, with retargetting they’ll get some confidence that you’re a real online brand with a presence.

Bootstrapped/Startup Marketing Part 3

Fri, 18 Feb 2011 19:55:56 -0800

For this third part on the series I’m going to dive into what people perhaps most traditionally think of with marketing startups, online advertising. Online advertising can work, but its definitely not cheap and it does take a good about of pounding at it to know what works. I’m going to break up the three key types of advertising, based on the way I’ve utilized them and evaluated them recently.

Contextual The first is search advertising, or contextual. The biggest usage of this has of course most recently been with google AdWords. For the complete novices out there, this is where you’re ad would appear as text based on certain search keywords. You’re the one that is able to determine the keywords, though google does rank you for relevance and how much you are willing to pay respective of the other person bidding on the same keywords.

For a startup I’d very strongly discourage advertising for keywords you already appear as the first result. A part of this goes back to part one of the series, SEO doesn’t cost you anything other than time, so invest in it. If you’re already at the top of the results, why potentially pay $2 for a click, when you’re already there. With limited budget you want to be very selective about which keywords you target. In our experiences the best targetting can be with regards to your competitors or similar products. Also take advantage of google’s tools here. There are many tools within adwords that will suggest new keywords, rank your relevance, and show traffic to certain keywords.

Always remember, what you really want is conversion, so make sure that’s what you’re tracking against performance. The key to doing this is linking your Google AdWords account to your Google Analytics. Google mentions in several places you should do this, but is very light on the instructions of how. So for a really quick how to:

We followed a rather structured process to identify what worked. If there’s enough interest in this specifically I’ll do a full follow up post later but the high level steps:

Vertical Most people term this display advertising, display is typically a banner/image ad going up on some website. I prefer viewing it as vertical, because if you target your display advertising correctly on sites you want to be on, its definitely more targetted than a standard billboard. Google does have some options for display advertising, but I’ve found their image approval process very painful.

With display or vertical advertising there’s a few key’s I’ve found valuable:

Image approval process matters as you dont have multiple weeks to spare
The ability to target/select your sites is an obvious must
Filtering out MFA (Made for Adsense) sites in the case you do open it up slightly by site type

On the third item theres two ways of doing this, you can manually check daily in google, and exclude sites. What you’re looking for here is sites with < 100 impressions and abnormally high click through rates. The other option for managing and tweaking these vertical campaigns is to let someone else do much of the optimization. My recent favorite for this is AdRoll. There’s definitely a slight premium over going directly to google, but they do let you easily get your campaign running and very much help optimize.

Bootstrapped/Startup Marketing Part 2

Wed, 16 Feb 2011 19:55:56 -0800

For the second part of the series we’re going talk a bit about finding the influencers in certain industries. We’ll get to more traditional means that people think of later, and if you’ve missed our first post that dealt mostly with SEO make sure you check it out first. In most online ventures there’s a key set of influencers, often times these are blogs or podcasts. Blogs can receive a huge readership, which are often very loyal.

The first step to taking advantage of this is obviously finding the correct blogs. As a byproduct of being in the valley I spend plenty of time reading Techcrunch among many other blogs. While a post on Techcrunch might result in massive traffic spike, or some moderate feedback, its definitely not in our target demographic. If our goal was investors then Techcrunch might be viable, but here we’re talking about marketing and marketing to core users at that. The blogs for your core demographic should be pretty straight forward and you can find them through google or other basic means.

The hard part might be once you’ve found them, how do you get on them. The first thing to do is work on networking with them if at all possible. If you’re in the same area as bloggers try seeing if they typically attend certain meetups or go one step further and simply organize a meetup or tweetup yourself. Or simply network, find others in your space that can make in intro. Hands down a first person intro will work better than any other method.

If for whatever reason networking and getting directly in touch doesnt work (which can often be the case). You can always resort to cold emailing them. This can work. There’s a few keys to this approach though:

The next step will bleed a bit more into the next post in this series, but here’s a small preview: Advertise to your influencers, not your users. If you have a limited ad budget, don’t advertise to your target user. Advertise to the influencer of your target user. We’ll get into more detail on this in the next post but a quick example:

If you’re building a sports site, why advertise on facebook for people who like sports when you could advertise to someone that works on espon.com. If the latter notices you and see’s interest they could potentially write you up or refer you to people that might be of interest.

Bootstrapped/Startup Marketing Part 1

Mon, 14 Feb 2011 19:55:56 -0800

This is the first of a 4 part series on marketing for startups/bootstrapped companies. Much of the learnings from this are a result of experiences with Registry Stop. The key to each of these is going to be measuring and reacting to your efforts. If you need help on this, check out previous post around metrics for startups.

So without further adieu, on this initial post of the series we’re going to talk a bit about the biggest free way to get traction and traffic for your startup. The best way to aquire free traffic to your site, is to ensure your site is optimized for search engines or more commonly SEO. Sure you can pay $3 for your ad to show up on certain keywords, but why spend the $3 per click if you can simply ensure you’re the first search result. There are slightly different methods for this for each search engine, but we’ll cover a broad set of items to pay attention to.

Sitemap Most sites have a sitemap.xml at their root level. This could perhaps be one of the biggest pieces of getting indexed that you can pay attention to. This xml tells search engines what pages they should index, how frequently they are updated, and the priority of the page. If you have dynamic pages, you should have this sitemap.xml generated so that it captures all pages.

If you need a little more reading on creating your sitemap take a look at: http://www.google.com/support/webmasters/bin/answer.py?answer=183668

Meta Tags Many search engines pay little attention to these tags, but that doesn’t mean that all don’t. You do want these tags to be as unique as possible per page and relate as much as possible to the content. The really key meta tags you want to have would be your description and your keywords. For a little more information you can check out: http://searchenginewatch.com/2167931

Other Tags While meta tags make pretty straight forward sense, other tags are immensly powerful in how a search engine indexes your site. The first is the title tag, you absolutely want it in and want it to be unique per page. You also want to correctly use html markup throughout your page. Just because your page looks like you want, doesn’t mean a search engine will understand it the same way. It’s important to use proper headings, including h1, h2, h3, to show just that, your headings and importance. You also want to be careful about the use of tables when not representing tabular data.

Once your page is live, you’ll want to submit it to google and other search engines. Do Not use a tool to do this, submit to the major search engines manually. Once you’ve done this you’ll start to appear in search results. The key from here is to begin monitoring what users are searching for, from within google analytics, and when they visit your site. A few key links to submit and manage your sites, as well as evaluate how you’re doing include:

Requirements Gathering for Consumer Startups

Tue, 08 Feb 2011 06:49:20 -0800

Most all development projects start with a hunch at a problem. Seldom do you have the opportunity of enough resources prior beginning building to fully vet all assumptions and define all requirements. Or at the very least if you do, you’re not in startup mode. For this reason the very first thing you build is often not the perfect solution. If you’re lucky its a start at a solution, and even if its not, if you’re close users will tell you what they want.

What this leaves you with is a couple of key items. First is get to the minimum product you can to vet your idea. Most commonly known as Minimally Viable Product. This should be the minimum product you need to vet your idea, and add some form of value for users. Once you’ve created this, don’t refine, don’t keep iterating, launch. More time won’t let you perfectly solve the problem, getting it in front of users will help you solve things perfect.

Second is make feedback simple. Often times support can be difficult, if its a form that requires 5 fields on your website forget it. If at all possible, provide multiple ways for users to give feedback. ALL ways should be simple for users. Email is great, because every user already has it. If it can be built on your website, great, but you should make registering as light as possible. Personally, I’m a big fan of Get Satisfaction. They give a great embeddable widget and provide a service that just simply works.

Third, and perhaps most important: Listen to your users! When users start to give you feedback, that’s the best requirements you can receive. There’s a couple of pieces to this step. When users give you feedback, you want to acknowledge them. Give some recognition for giving feedback; respond as positively and supportively as possible. If at all possible to act on requests do it, it will set you apart from the typical services they use of never getting a response and never seeing a change.

In short three key steps make for requirements gathering that trumps all else:

Tactical Steps for Startup Metrics

Thu, 03 Feb 2011 19:39:54 -0800

Metrics are obviously a very valuable area for start-ups, if you don’t believe in metrics and think you’re idea wins just because its great then you better start searching for your next day job. Dave McClure has done a great talk on start-ups several times over, you can check out a video and corresponding slide show at:

http://www.ustream.tv/recorded/5336115 http://www.slideshare.net/dmc500hats/startup-metrics-for-pirates-long-version

And besides, it’s a pirates acronym, so it’s got to be great. But translating these from concept to actual technology metrics is also something that needs discussion. You can’t exactly say we’re going to use Google Analytics and let it magically tell you everything, furthermore all of the web developers out there talking about tweaks on Google Analytics do little to actually tell you what you need to know.

To start with I’m going to lay out a few of the key features of Google Analytics, assuming it’s the key backbone for web analytics, and because it really is an amazing free tool.If you’re not familiar with google analytics go and explore before reading this, we’re going to skip right over the basics of what are visitors and traffic sources directly to what you need to use and customise to get valuable metrics for a start-up. Diving right in a few key items:

Goals - These are custom items that you want to setup. What is a goal? Well that can be a bit abstract but it can be a specific page that was visited, time spent on the site, or a certain number of pages per visit.

Events - Here you can start track events to any type of your liking. In most cases this can be some in page action, but the key here is to each event you can track a value. This starts to become useful if you want to track most shared content, most liked products, or other items. Typically those kinds of items might exist within your in-house database, but exposing it to Google Analytics gives you an extra level of reporting integration

Custom Reporting - Google is great out of the box, if you feel like it just grazes the surface then you haven’t explored it. But in some cases it’s easier to create a custom report of data that you can’t get exactly to, or to get very quick insight into similar data.

So this is a really high level, but how do you use each of the above to tie to actually implementing the metrics you care about? We’ll go through an example of how you track each of Dave McClure’s 5 key metrics based. For those of you not familiar with his metrics, please go to the above links and listen to his presentation first.

Events with Google Analytics and Tricking Pageviews

Wed, 02 Feb 2011 16:28:56 -0800

Google analytics is great out of the box, the basic tracking tag on every page will do a lot for you. Unfortunately most people never get beyond this. There are two key items with tracking that you can do that will let you get a bit further. There’s also plenty more on the reporting side, but we’ll get to some of that later. On the tracking side the first item is event tracking. This is perhaps most commonly used for tracking various Javascript events that occur during a visit, however it can also be a bit more flexible towards tracking values. A very simple example might be:

Or a real life example of this, might be on a FAQ screen, clicking the link to an anchored section of the page:

But events by their sheer nature give a bit more flexibility with that value field. In the case of a user sending a message

you might be able to track how many recipients it has, or any other numeric value you want to track.

Events overall are great, but you’re limited to the set Google Analytics report to know whats happening with them. So much of Google Analytics is based around page views, fortunately Google makes it easy to entirely fake a page view. If you’re wondering why you’d care whether it’s a page view versus events, we’ll get to that in a later post. For now what’s important to know is that you can fake any page view with: _gaq.push([’_trackPageview’, ‘/somepagenamehere]);

While seemingly small tweaks and extra additions these two items will create massive value for what you can actually do with Google Analytics. Stayed tuned later for how you use these with the base Google Analytics to actually get the value.

Converting Bookmarklet to Chrome Extension

Wed, 02 Feb 2011 00:34:49 -0800

Google’s documentation is pretty good when it comes to how to create an extension that opens a full page and has large functionality. But if you’re more interested in transforming an existing bookmarklet into an extension there’s not great quality on it. The steps themselves are really quite simple. The big key that’s not heavily documented is creating a background html that creates an event listener. After the jump is a full sample that would then call your javascript to activate the bookmarklet:

manifest.json

background.html

chrome.browserAction.onClicked.addListener(function(tab) { chrome.tabs.sendRequest(tab.id, {fun: “callBM”}) }); </script> bm.js

Selling Something New

Mon, 27 Dec 2010 17:04:50 -0800

I have a tendency of really latching onto very simple ideas. Typically these ideas don’t require complex engineering to make them happen. This is not to say the engineering is not important, but more so that it is some variation of engineering feats that have been done before. The reason I tend to like these over more complex engineering that really makes something better is that making something better is typically a marginal improvement. When it’s a marginal improvement it’s a lot harder to sell.

With marginal improvements you have to:

In contrast if I address a problem that hasn’t been solved my life instantly becomes a lot easier. I no longer have an argument of something not being good enough today, it becomes a question of value and how much its worth to solve the problem. Haggling over price is a conversation I’d rather have than trying to justify value and convince a customer they’ve been wrong in their choice for so many years.

Interviewing, A Reflection of the Company

Sun, 29 Aug 2010 17:39:05 -0700

The more I’ve been exposed to it the more the way a company conducts interviews is a very strong reflection of how the company’s current state is. If you experience a very half hazard interview it’s likely a result that the person interviewing is half hazard in other aspects of their day to day. If you experience that someone is very set in their mind in what they want, and expecting a very cookie cutter answer, it’s a reflection of how they think. There are some cases in which the interviewers simply do not know various methods/styles as such I’d like to address what I feel should be appropriate interviewing process. I’ve been in situations where more of this process was followed than not, and in those cases bad hires were the exception not the norm. The biggest unknown after that was how long until someone was fully integrated into the culture and not a noob but a veteran in some area that others deferred to.

In the interviewing process the very first key is knowing your role. Hopefully there’s at least more than one person interviewing. If you’re interviewing you have the need, the person on the other side of the table from you may or may not. It may just be an opportunistic interview for them, or they may be avidly looking for an opportunity. Either way you should ALWAYS be in a sell mode of some form, the only question is how heavy this sell mode is. Are you 10% selling, 50% selling, 70% selling, 90% selling? In my experience I’ve never been in an interview of less than 50% selling to whomever I’m interviewing. Being in this mode is only going to convince them to come if they’re not looking that hard OR make them even more desperate to join you, which means you could offer them less (While I do have issues with this, the fact remains that it happens).

So back to your role, the key roles correspond pretty directly to the type of interviews you should conduct. These can be blurred/mixed and can be conducted by different/same people, but I would at the very least not mix the questions at least follow some structured order. The key types are:

Next is the Behavioral/Contextual interview. This will consist of many what would you do in this situation. There are books and books that exist on example questions and how to respond to these. The key that an interviewer is looking for is that you solve a particular issue and follow a logical process here. If they ask a question about dealing with conflict, they don’t want to see that you just ignored it. There’s a fine line of addressing the conflict so the working environment is better, but also ensuring the project makes progress.

Finally is the Technical interview. Approaching this from a very technical area of programming, you should not be testing syntax. You should be testing generic programming thoughts/concepts. If you want to give a syntax test go online and find one of the thousands that exist. If I’m not confident someone can pick up a language knowing general constructs, I would never hire them and they wouldn’t have made it this far in the process. If you want to throw in 1-2 questions about such that’s a manageable amount, but most of the interview should follow more open ended questions, questions that have multiple answers. How to write a for loop in Java is not acceptable, how to write a function that produces fibonnaci and a corresponding test is acceptable. Any time someone asks a question like this, I’m open to working at the company. It shows they put thought into the interview process and care about quality of their hires. Those types of questions test several concepts at once:

As I mentioned before there’s several types of interviews and various levels of selling that occur. During EVERY interview you should give the person you are interviewing the opportunity to ask questions. Whenever this occurs you’re in selling mode, and often after a question you’re in selling mode. Your answers should NOT be 1 word answers, they should be thorough and open the opportunity for follow up questions on your answer. With any of these questions you should be able to interview someone at a level lower than you, or even higher than you. There are some job specific ones, sometimes of managing people or budgets when interviewing upwards that you may not address, but a lower level employee interviewing a manager is a valuable part of the process.

With regards to your role it should be discussed ahead of time, you should conduct 2 behavioral interviews if the first you have an unsure result from. You should not ask the same questions if they passed with flying colors the first time. If you do this, and there are cases its relevant know your reasoning behind such. The bottom line is know the types of interview you’re conducting, discuss it internally, and know what you’re looking for as a result. If you haven’t put this much thought and effort into the process it will be apparent and the resulting quality of person you get will be a direct reflection of that.

Selling... Seduction... same difference

Mon, 03 May 2010 21:25:01 -0700

On an entirely separate blog I have a full write up on seduction. The other posts contain steps for how a guy would seduce a girl, I think it’s actually quite pertinent to selling sighing business. Before you start making to many assumptions about the other post let me explain a little further, but dorm the selling side within business.

You see in selling something there’s usually a lot of sides to what you’re selling, just as there are to a person. The real key to this is to know which features are relevant, while you might like the option of mind reading, a more likely one is to become friendly early. Become buddy and friendly with them quickly, commiserate with their woes and try to bond with them over similar experiences. This will make the initial conversation over what they’re looking for much more constructive. And while I say this is a conversation about what they’re looking for, what you should be asking is what their pains are.

For every pain that exists, theres 5 to 10 ways to solve it. However if you miss the pain points and problems they’re having, you’re more than likely to miss on the pitch. IF you are able to get the pain points correctly it simple becomes a process of guiding them in ways to solve their problem. This process usually starts with dissecting the fundamental issues in the process, then building it back up with your product or solution being the backbone. In the same way it’s hard to win someone over on a 1 on 1 personal level without knowing what they want, you can’t sell to someone that doesn’t have a problem, and won’t be able to pitch well without knowing it.

Why The Cloud Will Finally Work

Mon, 05 Apr 2010 17:29:35 -0700

The cloud has a lot of technical arguments going for it. The problem is consumers don’t understand the cloud, they don’t understand virtual storage and growth and syncing and the complexities of things. The average consumer is generally pretty dumb, they just want to be able to do things and it just work. If they ask a question they want an answer, not the deduction behind the answer. It’s why I loved mint.com so much when it launched. I gave it accounts and it told me everything I wanted to know. If it was wrong I seldom noticed it, such as classifying a purchase into a wrong category. My suspicion is that 98% of the users don’t notice much of the mis-classification that happens. They look the first time and it looks pretty good so they trust it, because if you look at 90% of purchases and classify them, why use mint, why not just use excel, or even go back to a paper and notebook?

Cloud is that same type of issue, it needs to just work, users need to just expect their document to always be the same. I think the iPad but more specifically MobileMe will have a great shot at doing this. The reason the iPad will play a role is now the average consumer will have more than 1 device. They’ll have their laptop/desktop and an iPad. This user will want to work on the device, and more than just email. Having their application open a file, work for 15 minutes on a train/bus, turn it off, walk in their front door and open the same file on their computer will really bring cloud storage/computing to a consumer. Because apple controls the reins on the primary applications where this has value:

This will push the expectation on developers to deliver the same kind of experience with the cloud, it’s going to become the norm now. Not because its cost effective, not because of the technical benefits, but because it’s going to be transparent to users, and users are just going to expect it from now on.

Valuing Employees

Sat, 27 Feb 2010 17:10:23 -0800

A coworker and I were recently having conversations over employee compensation. We covered the gambit around employee feedback, evals, and compensation. He mentioned Joel Spolsky, and his format of being very open about where individuals were ranked. He also pointed me to: http://alumnit.ca/~apenwarr/log/?m=200904#05 which provided good insight, though I most like his final point. The end goal with evaluating your employees and compensation for them is to make sure they’re happy. Sure the business should make sure they feel like you’re worth what you’re being paid, but usually there is no question about this, or if there is you’re quickly escorted out the door. While this is an interesting model, I think it can be much simpler, but companies usually confine themselves too much in giving credit to employees.

There was another recent occasion where a statement was made of ’no more playing stick them up, until next year’. When I first thought about this, I knew I didn’t like the statement, but was unsure of why. The reason is that there can be several reasons why employees leave. Only one of which is compensation. If you feel you’re being adequately compensated for the job you’re doing it makes sense.

But there’s another reason thats very clear in the valley but less clear in other parts of the country. Paul Buchheit at Startup School this weekend in Berkeley said it very well: If you’ve been at your job too long, QUIT. Meaning if you’re comfortable, you know the people, you know how to do your job, and you’re not being challenged, then you should go somewhere where you are challenged.

So what does this have to do in regards to playing stick em up? Well if you’re at a comfortable place you should be compensate appropriately that’s fair. However if you’re at a comfortable place, you should either find ways to be challenged there or move on. If you’re challenged there it means your role over time will change, there’s not a standard guide for how quickly you become experienced in that role. It it’s two weeks, then salary should be re-evaluated then, if it’s 3 years salary should perhaps be re-evaluated yearly to keep up with changes in value to the dollar, but nothing more substantial to that.

At the end of the day it means you have to deliver value to an employer, and as long as your doing that the employer should recognize you for the value you deliver, based on merit, not based on policies laid out. Whether you jump to an extreme of merit/value being very clear such as a Joel Spolsky method, or follow something more traditional of a large company, the bottom line is you should give your employees what they’re worth, and as an employee its what you should expect.

A coworker (@danfarina) and I were recently having conversations over employee compensation. We covered the gambit around employee feedback, evals, and compensation. He mentioned Joel Spolsky, and his format of being very open about where individuals were ranked. He also pointed me to: http://alumnit.ca/~apenwarr/log/?m=200904#05 which provided good insight, though I most like his final point. The end goal with evaluating your employees and compensation for them is to make sure they’re happy. Sure the business should make sure they feel like you’re worth what you’re being paid, but usually there is no question about this, or if there is you’re quickly escorted out the door. While this is an interesting model, I think it can be much simpler, but companies usually confine themselves too much in giving credit to employees.

Who will filter the stream first?

Tue, 26 Jan 2010 17:56:01 -0800

Facebook is where I have more noise than any other social site, twitter may even tie facebook at amount of sheer content I receive in my feed. With regards to the ratio of what I care about to what I see facebook is a lot better, due to their news feed versus live feed. However, their news feed is still very often off. I wrote some time back about web 3.0, and how essentially showing what I want to see is what the web will become. You’ll take the vast amount of content and distill it into what I want to see. People seem to be taking very half-hazard shots at it and its quite a let down.

I’ll start with twitter, twitter gives no filtering on the content based on their view. Instead they put the control in the users hands for me to create filters based on friends. This means I have to take time to go through all of my 600+ people I follow and group them into lists, then navigate each list when I want to view such topics. This is not only time intensive it still doesn’t accomplish what I want which is information by topic in a lot of cases, especially on twitter.

Moving on to facebook, they at least take care of the process (almost transparently) of who I want to see. If someone shows up, I can simply say hide from the news feed. I have a strong hunch that when I click out of the news feed and go to someone’s profile it weights that person to be more frequent. This is a very logical deduction to make, and in most cases I’m pretty pleased with the result. The big problem with this is it’s still all about the people, not about the content. If I clicked on someone because they mentioned coming to visit California, I may have not talked to them in 2 years, but would simply like to offer up my help when they visit. This doesn’t mean I want to get updates about them after they visit.

Facebook is definitely a leader in this space, first they’re one of the few with enough content in a feed that filtering even matters. Then the fact that they get a user beyond analysis-paralysis it’s a positive move, however the classification is wrong. Whether it’s twitter, facebook, or some other service that hasn’t emerged yet, filtering a mass of information to what a user cares about will be huge.

Amazon and Netflix have done this for products, why has no one tried this for information?

Issues Aren't Always Bad

Mon, 25 Jan 2010 17:30:10 -0800

I often encounter people whether at my office or at other places of employment that are distraught after getting an earful from a manager from some problem arising. The problem usually isn’t in their control, and therefore they don’t understand why they get heat for this. Most managers though do actually understand when issues come up, however what they don’t appreciate is late notice, lack of problem solving, and dictating what should be done next.

Managers typically want individuals to take control of a situation and work towards resolving it.

One thing you can do to ease the backlash that may occur for issues coming up is to communicate proactively as things develop/occur or lack there of. Keep in mind this should relate well to your managers style, some managers only want details when they absolutely have to have them. In that case you’ll want to gradually give your manager a heads up, but not burden him with too much information. I would venture to say however that most managers appreciate details, details are great to give them insight into how things are going and allow them to feel engaged at a lower level.

So assume you’ve communicated regularly to your manager, this still does not prevent any issues from happening, but rather reduces the shock when something does. At this point a manager still does not want a fact stated that there’s a problem. In every case I’ve encountered the manager wants you to take ownership of the issue, meaning to give some options. Once the problem has arisen you should instantly start looking for ways to solve it. Often time these ways are not within your power to make the final decision, though you do have a great deal of control in presenting the case to a manager.

Finally if you want brownie points, take less credit for any of the work you’ve done and give your manager more. If you’ve communicated early, laid out various options for how to resolve the issue with pros and cons of each you’ve done what you can. This should make it very easy on your manager to simple say, go with Option B, and follow back up with me on Monday. At this point if you give your manager most of the credit for helping the issue, it will only come back to you. While this is potentially the least critical of the three points, it can often pay off equally as much.

This is easier to do as you pay attention to issues and start to become pro-active. Taking ownership may not be in your job title or description, but it will definitely get you less earfuls from managers, and likely move you through the ranks faster.

Forget Doing Something Better, Do Something Different

Thu, 21 Jan 2010 14:49:33 -0800

With marginal improvements you have to:

Over the coming days I’m going to be posting a few of these examples/ideas and why I like them. Many of them are still being thought through, and while as I sort them out, I’m generally happy to publish high points about them. The even bigger key here is that success is typically in the execution and less so in the idea, though even then I’d prefer to execute on something that has less battles than something that from the onset has more. By doing something that is being done today you get no advantages of penetrating the market.

Parallelizing the Product Process?

Wed, 20 Jan 2010 14:42:49 -0800

Who Cares About Visitors?

Tue, 19 Jan 2010 15:55:22 -0800

The web is becoming saturated. It’s no longer the pimply faced 20 somethings living in their mom’s basements that are the key users and the source of most of the traffic on the web. Now you have communities for pregnant moms, sites for elderly widows looking to date, and social sites for kids from the time they’re able to talk. So now that the web is hitting its saturation point of types of people interacting it becomes a critical issue to take advantage of those users and get them to do more.

Up until this point it’s been about getting more users, more people, that easily translated into more hits of course. But now what people want is richer engagement, they want users to be active, then to contribute the content. When users are more active it means just as many hits, but more engagement is a harder issue than more users. More users usually meant more marketing budget and a piece of allowing the market to mature. More engagement means you actually have to be more methodical about what you’re doing. It means you have to more closely balance the quality of what is one your site versus how you manage the ad’s which equate to revenue.

The news that facebook is exposing how many impressions a page gets versus the amount of feedback has been received is a driver in this direction. Fortunately for facebook, they don’t have control over Fan Pages so it’s not their job to drive up engagement on a per page basis. Unfortunately for other major publishers this is a very unscientific science, and what works and doesn’t work can only be a result of trying out new options. There’s a lot of opportunity here for those that really figure out what key drivers are of engagement, it’s one thing to measure it, it’s another to be able to readily define out to improve it, and no one has repeatedly done that on the web yet.

Behavioral Targeting versus Contextual Advertising

Thu, 14 Jan 2010 17:25:37 -0800

There’s a continual shift that seems to be happening on whether contextual advertising is better than behavioral. It seems that most people are becoming bigger and bigger on behavioral, and assuming that contextual has reached it’s peak. After meeting with a company that at first started to do both, blurring the lines, taking advantage of each when they had appropriate data it started to become clear that they more so have their place and time. Behavioral and Contextual shouldn’t be direct competitors.

Does Authenticity Matter?

Fri, 09 Oct 2009 20:36:41 -0700

Facebook is as authentic of a network as you can have, you have you, you are you, you’re not FunChick21, or MotorcycleGuy42, You’re Craig Kerstiens. You have a birthdate, which is likely you’re birthdate, you have a job that is your job, you have friends that are you’re friends. Facebook is probably as close to a virtual representation of your true life as you can get on a social network.

Then you have Twitter. Twitter is probably as inauthentic as they come. You ARE FunChick21 or MotorcycleGuy42. You have that name, and that’s it. You have friends, but they’re up to you, it’s a one way relationship, not confirmation of friendship. For that reason you have 1,000,000 people following Ashton Kutcher, and he follows under 100. You’re friends could be celebrities, they could be friends, they could be random people that you liked their tweets.

Facebook from an ad perspective I know almost everything I could want to about you from an ad targeting perspective. Few sites could give much more demographic info that I’d want to target effectively. Twitter I have next to nothing, I have a user name, and the content of what you say.

So there’s an advantage to facebook. But then you have the context of what I’m trying to do. If you’re facebook, you have users engaged in the site not wanting to leave. If you’re Twitter you have users that won’t be on the site for beyond 60 seconds. Getting them to leave shouldn’t be an issue, which means if you can drive where they are leaving to it should work out well for you.

But there’s a final piece. It’s a heavily growing marketplace, that really neither of the major communities picks up on, and it’s virtual goods. Virtual goods exist in either form of network, but neither seems to take advantage, meanwhile it’s the entire basis behinds such communities as World of Warcraft. How will they start to roll into mainstream networks, that’s yet to be seen, but I’ll be curious if virtual goods can become dominant in authentic networks or if they’ll primarily reside in inauthentic networks as they do today.

Motivating Users

Wed, 30 Sep 2009 22:25:24 -0700

I’ve done some recent advising for someone working on a site that’s of a social nature. The site is intended in some form to motivate users, the initial thought on this was to define a lot of rules, and send automated messages to users. To me this approach felt very 1990’s. So assuming that were true, then comes the question of how do you motivate users?

I think rules do usually come somewhere in the process, you need to know when to motivate the users, however the catch is how you expose the results of those rules. And in thinking about it there’s a variety of levels at which you can expose those rules from more raw data forms, to several steps of analysis or actions on top of them. Raw data works for analytical users, users that naturally consume data AND are already heavily motivated for the set goal. Lets say for sake of argument this is 10% of users. This means by exposing very little gloss, and mostly the data which the rules are run on you’ll lose 90% of users.

The other extreme is to almost entirely hide the rule and obfuscate this with actions that you know can lead the user back to their goal. I feel not quite this, but some derivative of it, in the form of social nudging could be incredibly useful for many different means. There’s many real world situations where people rely on each other for support, you can think of traditional AA type settings, or weight loss, or other groups with a central focus of accountability. But these forms of groups seem to be largely absent in the virtual space, or when they are present are realistic groups simple dropped into a virtual space.

What happens when users want some mixed level of privacy, but encouragement? What you need to do is to have your rules define when you need to have users get motivation from others. The fun part is how you drive users to interact at those times, this can be through a variety of options, all depending on your site. A very basic example of this is the birthday reminder on facebook. They’re not just reminding you of it for the sake of it, they’re reminding you, so you use applications to send virtual birthday cards to users and messages, therefore enriching the user experience. If other sites were able to apply this to goal setting, and use social nudging over system rules, users will feel more connected to others and it will likely increase effectiveness.

Micromanaging

Thu, 24 Sep 2009 12:41:57 -0700

Three times in recent years I’ve had to micromanage others. Though probably in the contrary form to what you would expect. Most people think of micromanagement as their manager wanting to know every detail about their day, and be involved in every minute task. In most cases this form of micromanagement is never received well. Generally my feelings are that if I have to micromanage you, you don’t belong in the role you’re in, though I suppose exception cases may exist.

But the form of micromanagement I’m talking about is upward management. This could be needed for a variety of reasons:

First it’s helpful to start with a regular process. Sending status emails every morning or every afternoon, will keep them in the loop. It’ll prevent them from asking too many questions, and will keep them in the loop, but mainly with knowledge you feel is pertinent.

Discussions and calls with insider info will allow them to feel as if they’re driving the process. If you provide information as factual and provide the facts of how certain things have historically worked, or do would at a tactical level from you’re experiences it will help to steer the process in that direction. If they’re outside they’re comfort zone they’re going to take their best guess, it’s 50-50 if that’s the same as you’d see fit.

But, do let them drive the process at hand. If they feel as if you’re attempting to drive it, then they’re going to feel as if they’re authority is being challenged. It’s more a kin to telling them the directions of how to get their and letting them drive the car. If they’re slightly off path, but in the right direction let it go because otherwise you’re time will be consumed with trying to get the exact directions.

Why the enterprise cant reach consumers

Mon, 07 Sep 2009 20:21:08 -0700

Most of my working career has been in what many would call an enterprise environment. Corporate structure well in place at most of them and in those cases any development followed closely to a waterfall methodology. You laid out requirements strictly and then built to those requirements. You essentially had nothing to show until you got to the end product.

Having been in the valley for several years and interacting with some startups and in other settings, I’ve seen a very opposite mindset. The “release early, release often” concept. First you never have clear requirements when dealing with anything a startup should be tackling, if it’s a very clear easy to solve problem, then someone else will have already tackled it. If you’re doing something new, which you should be you can’t gauge how users react, until you actually have something in front of them.

A prime example would be twitter. Twitter was a simple concept, yet it has been done before in many ways, what’s the difference in blogging and twitter? Well twitter requires you’re shorter, has no title, just content, and centralizes the data. It actually incredibly reduced what the user could do, and in doing that created new and broader functionality.

As a more general principle users don’t know what they want. Users will complain about how gmail doesn’t have folders, but they use folders in outlook only because they can’t properly search. If you take away something from a user, they’re going to complain about it. This is fine, it’s not a problem, as long as they didn’t actually use the feature, and there’s other steps you can use to manage this backlash.

But I’d believe the over arching key is that you can’t ask users what they want. If you presume to know you’re going to be wrong, so what does this mean. This means that you build something and you launch it. You don’t test it in user groups, you don’t test it in a lab, you don’t test it in an invite only beta, you launch it. You launch for users, and if they don’t like it, you haven’t upset thousands of customers because you don’t have that many. In consumer land you can launch something without anyone knowing who you are, and then truly test how users will respond, this is far more powerful than the traditional model used in enterprise. It’s the reason most of the biggest sites used today are emergent from startups and similar environments, because they built themselves on what users want.

How to succeed in the workplace? Go to lunch!

Sun, 06 Sep 2009 17:20:45 -0700

Something I learned very early on in my working career, not so much from my experiences but from observing the results of others, was to engage at a social level as early possible. This doesn’t mean you have to take time after 5:00 to get to know someone, the best opportunity exists every single day during what you would already do, lunch! Everyone usually takes a break and eats lunch during the day, usually there’s two groups in an office. Those that always go out, and those that bring their lunch or meet others for lunch or maybe even work through it. If you notice those in the first group in your office my guess it’s usually easier for them to get things done, they’re normally a little more in touch with things that are going on. Especially if you can manage to branch out a little and go outside of the people you work with every moment of the day.

It’s generally one thing when you come to work and do you’re job. But no matter how large or small the company you can’t entirely separate the work from the personal, and personalities come out and it becomes some form of factor. Yes, most people are professional, but at the end of the day you’re more likely to help someone that you like even if you’re busy, than someone you don’t. Maybe you have a 50-50 chance of being liked, but I’d say by being disliked you’re not going to get help any slower really.

I know there’s a few people reading this and thinking, this is fine, but I don’t really want to spend money on eating out every day. In my experience the extra knowledge you gain is well worth the price. When you through an executive level person, a developer, a sales guy, a marketing person, and some middle management into a single lunch outing, all come away with a lot of insight into area’s of the business they had little exposure to. This come’s back to my post about leaders and developers, if the people in your business only understand what they do and nothing outside you’re as a whole going to be less effective. Most people in your company don’t think this way, by doing it you’ll become more effective than the average person.

And to think, it all starts with something as simple as going out to lunch. It’s the reason that from day 1, to being a veteran in a company, when someone asks “Do you have plans for lunch?” 90% of the time my answer is “No, when do you want to go?”

Building apps from the echo chamber

Thu, 23 Jul 2009 17:20:04 -0700

Leaders and Developers

Mon, 13 Jul 2009 22:56:16 -0700

Mentoring

Tue, 16 Jun 2009 17:29:38 -0700

Takeaways from Consulting

Sat, 13 Jun 2009 20:46:39 -0700

Take ownership of the process

Why Google Wave Will Fail

Mon, 08 Jun 2009 01:00:26 -0700

Google doesn’t understand social or collaboration. There’s not much more to it than that, though for the sake of making this a an actual blog post I’ll explain a bit more.

Blogger was huge, it was the place to go if you were creating a blog. There weren’t many www.mybloghere.com, many of the largest most popular blogs on the internet were on blogger. People had accounts, people registered to post comments, people had full fledged profiles that could have easily preceded a facebook profile page. Google bought blogger and had more than enough resources to grow blogger into a sizable social community. But if you visit it looks much the same as it did 5 and almost 10 years ago.

Google spreadsheets is one of the best online spreadsheet programs, and you can even collectively work on a spreadsheet with others at the same time thousands of miles away. Who do you know that uses google spreadsheets that isn’t some form of a techie? It likely has a user base of under 1% of users of spreadsheets, and its not because it’s missing the power features of pivot tables and such. If you re-brand it as a collaboration tool when working and throw chat/video/whiteboarding in the same application google would have an instant growth 10 fold of users, but they don’t understand that a user seeing the same thing on a spreadsheet and seeing what the other types in a single document isn’t collaboration!

To jump ahead of the curve, counter arguments I’ve already heard are around ads and gmail. Gmail, google didn’t improve email, they simply give you lots of space for free, if gmail were to cease to exist tomorrow users would simply jump over to yahoo or microsoft. Ads, google changed the ad industry by making search effective, they’re good at algorithms and such, but they don’t get users and collaboration, and at their current rate they never will.

Wave isn’t meant to just improve email, it’s meant to be a tool for collaboration, to view a conversation as an entity, and google just doesn’t get the conversation part.

The benefit of leverage

Sat, 23 May 2009 20:13:53 -0700

Due to many recent events, which I’m sure I’ll disclose later, I’ve been in an interesting situation of a good bit of leverage. While leverage can of course be taken advantage of and misused, it also plays a very fair role in business. When hiring a new college graduate in most cases you take the offer you are given, some are able to negotiate for a higher salary, but most are quite unsuccessful. This is because they don’t have any leverage. If you ended up walking away from the job offer they would simply hire another college graduate. While yes you may have a lot of potential, it’s only that potential and not proven.

Additionally within a corporation, the company will often do just enough to keep an employee there. If a company does a great job, an employee gets a pat on the back. If an employee is indispensable (though no company will ever admit to this), they may get a noticeable reward, but it still doesn’t usually cover the value the individual is actually providing.

This responsibility to get what you are truly worth usually lies with the employee. The hard part of this, is knowing when and how to use your leverage. First you must actually have leverage, this commonly in potential revenue you would bring in, or internal knowledge that you may have. Though I’m sure others have varying experiences, mine have been to make your dissatisfaction with a situation known, but in a light manner. Meanwhile make it visible that you’re open to other opportunities as they may come along, this can be via twitter, blog post, or water cooler talk. The final thing, and hopefully this is an easier one, is make it clear that you have the leverage, the sale should be a big one, or the internal knowledge should be costly should they lose it.

The most unfortunate part of all of this is that, in my experiences the leverage is typically needed to get a fair deal. And the single point of requiring leverage no longer makes it fair, but at least knowing this ensures you’re not left out in the cold.

Why Twitter Is About To Get Old

Fri, 08 May 2009 21:23:41 -0700

Twitter has finally hit mainstream, it was bound to happen and with Ashton, Oprah, Shaq, among many others it’s now going to be around for a while. This means a lot of interesting things for twitter such as scalability to handle this new massive growth which will be much more regular unlike the more sparse spikes they would see before. But as user of twitter it means something far different, it means twitter is about to run out of usefulness. Before twitter was a nice resource to be able to regularly communicate with micromessaging, now it’s quickly going to become one of the noisiest things on the web. This would be a fine case, IF there were ways to manage the noise. However with twitter you either get really focused drops or the entire firehose, there is no medium in between. Sure twitter searches can be nice, but this requires maybe 20-30 constant searches to be up to date on what you care about.

If someone is able to find some way to manage the firehose of information it will prove as valuable tool as twitter itself. But if that doesn’t happen in a respectable time, twitter is going to get really exciting to a lot of people, and just as quickly turn a lot of people off.

Takeaways for a startup

Tue, 04 Nov 2008 19:19:30 -0800

I’ve learned a great deal since being out in the valley, first, is the confirmation that I do love the atmosphere. Second that I really miss the fall, but more importantly I’ve learned a lot that I feel is useful in a startup environment. The startup environment and business model is a very unique one, especially in recent years. It seems to not require a business model to get someone just to give you $10 million and hope you come out with one at some point. And in some cases it works, I mean it did for google, but well the failure stories are a lot more abudant than the success ones.

Here’s a quick run down of how I think one can build a successful startup in any economy, and why I feel our current state is prime for someone following these steps.

Have a business model. Yeah, it’s less glamorous that a facebook that has millions of people log on to it each hour. Take for example the guys at 37 signals, they’re probably happy to get a million uniques in a month, or perhaps even a year. But per employee, per their cost, their revenues are at least 10x if not 100x of facebook’s. And they most likely see that in revenue per employee as well. Why is this the case? Simple put, they have a business model. They have some product they build and people want it, not want it in the sense that they will spend hours of time aimlessly using it if its free and convenient. It accomplishes some actual goal, and saves people time and makes their lives easier. Oh and something I’ve probably said too many times on here, but ad’s isn’t a business model, unless your specifically an ad company.
Get competent employees, I want to even take this a step further and say to get employees that believe in the technology but also in the business. Its one thing when your at a large company to have someone that’s extremely specialized. But at a startup everyone wears a variety of hats. Your secretary could land a big lead on a sale, your intern developer could come up with your future marketing slogan, and because of this everyone you bring on board needs to be fully on board with every aspect of your business. If you’re growing slower than you hoped it’s fine, and worth it to be short rather than over inflated.
Part of the reason you’d rather be short than inflated is you don’t want to take on capital. To quote someone else I’ve recently come into contact with, ‘you want to get off the tit as fast as possible’. To give a little more explanation, most startups take sizable investments from VC’s or other parties to get going. The problem here is that you then have to answer to them, if you’re the one with the idea, with the vision, why would you want to give up any control of that. And the fact is you shouldn’t, going back to point 1 and 2, if you have the business model it shouldn’t be long before you see revenue, and if all your employees believe in the company, they wont expect a high cushy salary, they’ll take ownership in the company as part of their compensation.
Keep your employees happy, the easiest way to do this is pizza and beer. It’s simple, but works. If you have happy employees enjoying what they do they’ll work harder and longer. The smaller you are the more poisonous it is to have employees that don’t fit in and embrace your culture. Sure diversity can be a good thing, and you do need some balance of it. But more than that you don’t want to ruin the atmosphere and comraderier that comes along with a startup. In addition to keeping your employees happy those small things help, pizza and beer for 20 to get an extra hour of work and the intangibles of helping build relationships that allow them to work better is far cheaper than paying them extra dollars to work late.
Don’t overspend, this may seem in contrast to 4, and often times poeple go with one extreme or the other. Pizza and beer makes sense, Lobster lunches do not, I don’t care if you are google, they still don’t make sense. If you are doing extremely well and want to give back to employees give it with cash, they’ll appreciate it more. Bsimilar search for John McCain shows a normal pro-McCainut order in lunches every day and pool tables are no necessary expenses.

If there’s enough response I’ll follow up with some of the more tangible ways of making this happen, but for now, enjoy these big picture take aways.

All the bubbles haven't burst yet

Mon, 03 Nov 2008 00:51:54 -0800

As I watch the news and posts roll in each day with new layoffs in the valley, ranging from large corporations such as HP and EA, down to the small guys such as seesmic, imeem, searchme, and zillow to name only a few, there still seems to be a demand for certain job skills. While as I look down the list some of these I dont feel are any longer demanded skills, and others will soon be there. In part I want to call attention to facebook first. While everyone and their brother, when launching a website wants to build a facebook app to deliver some of their content on to facebook, the time and effort put into this is no where near the return. The market has become so flooded the penetration you will get is quite trivial. Futhermore cpm’s have already plummeted for advertising on facebook, ranging in some cases around .10-.15.

Meanwhile iphone developers are still rushing to get their idea into the app store. While a decent idea, and I more than support new applications, so that I can use them on my phone, a 10 million person marketplace is still extremely small, especially if you’re not a mainstream application. It used to be that one million users on your website was the golden point in social networks when you could really go for significant funding, or you could start talking a selling price. But thats not the case anymore, much less to reach that on the iphone you would have to be somewhere between the top 10 and top 25 elite apps.

So sure, you might not have that many users that download your app right? But theres ad revenue possibilities. Well that works right now, the iphone is seeing insane cpm’s in some cases as high as $50. This is simply not sustainable. While some claim that mobile advertising is the holy grail of ad’s, that only works if you can capture user intent, which in contrast isn’t so simple. It works for search, because well when I’m searching for something that usually captures my intent, not so for when I’m using my phone. I strongly suspect that as these new hot markets calm down, companies will do proper analysis of ROI and no longer want many of these niche skills.

In these cases I think there’s going to be able more smaller bubbles bursting and a lot more niche developers searching for something more than developing the occasional website for the store down the street.

Ads is not a business model

Sat, 13 Sep 2008 03:10:15 -0700

I recently attended part of the recent Techcrunch 50 conference, and when I wasn’t there I was watching much of it online. For probably 80% of the companies when it came time to ask about their business model, they said ads. Then they talked about cause they have all of this great information about the user they can advertise better than they used to. The problem is they’re forgetting all about user intent. This is why ads on facebook simply arent working, with some CPM’s being as low as .05.

Ads work on search because users are looking for something, and if you place an ad for it they’re fine with it, because they didn’t want to stay on the search page. When a user goes to facebook they want to stay on facebook, not leave. When a user is in an application they want to stay in the application, as long as the site or application is a destination or resides on a destination it will not make great revenue from traditional ads. Yeah, theres oppportunities for newer creative advertising, but this form of ad’s will not provide the same revenue yet as others. If you’re launching a product or site, actually consider how your worth making money, just saying ads in most cases is not a business model.

Google did something right . . . . Finally

Thu, 04 Sep 2008 17:21:56 -0700

Forget the benchmarks, forget whether its truly faster or slower, forget whether the market share is 30% for non-IE browsers (though is this only for US or internationally). Google Chrome evolution or revolution, whatever you want to call it, it makes me actually want to stay in the browser. I just want plugins, that function as well as the browser alone does. Yeah theres rendering problems, and some oddities, but the browser as a whole is smooth. I actually feel thus far its the best of IE and safari melded together. The only problem I see right now is the lack of plugins, which my guess is will come VERY SOON. Meanwhile firefox when having the plugins I want enabled can be sluggish, if Chrome plugins are of equivilant quality then I can’t see how the browser wouldn’t be at LEAST as smooth.

Chrome really is a win for Google, whether they can monetize it or not. It helps them to keep people off the desktop and in the browser. I could go on for hours about bad moves they’ve made, such as picasa, but Chrome was actually a good one.

Being an employee

Fri, 29 Aug 2008 17:44:19 -0700

As I currently work at a startup I have a small stake in the company. When talking with one friend of something I have been working with someone with on the side, the question came up over if this was a conflict of interest. I was actually quite shocked to hear the question at first, not only did I expect them to do likewise, as I know many that do. The full on conflict of interest statement just shocked me. Being at a startup it does make it slightly more of an interesting statement, but I received similar comments sometimes at my former Fortune 100 employer. I’ll start with that place and then migrate to the startup environment.

I could not disagree more being an employee at some place, and working on additional things being a conflict of interest. In short you are an employee, not property, your best interests lie with yourself. Sure its great if you believe in the company and what they do, but in our generation you are not attached for life to the company you work for. The company has claim on what you do between 8-5 with regards to work, sure if you do things that may damange a company brand or your effectiveness to do business its fair for them not to retain you, but simply doing additional work in your spare time? Hardly!

Now as we move on to the startup atmosphere, where it’s pretty standard that when becoming employed you receive some amount of equity in the company. In most cases with not being a founder this stake is of relatively small size. Sure you could consider Google where I believe it was over 400 employees that were made millionaires by their IPO, but these situations are very rare. The equity receive normally vessts over a period of time, and from my perception is simply equivilant to a portion of your pay no more no less. Sure it does make you feel more of a sense of ownership, but does not extend to the full extent of the business owning you.

As an employee you’re being paid to perform a job, they don’t have full claim to what you do on your time.

A Lesson from the Wal-Mart Model

Tue, 19 Aug 2008 03:43:42 -0700

Many people criticize Wal-Mart for the way they run their business. I personally find no problems with it, as their goal is simply to make prices competitive. If you care about the other details then either A. shop else where or B. donate to those causes you feel should be supported with the money you save. While sure some of these qualms may be justified I’d like to hint at another thought, of why people don’t take advantage of the same approaches.

You see I recently started using a service, which I’d prefer not to disclose yet that gives me access to completing very monotonous and tedious tasks for very low price. Indeed there is some overhead involved but once you learn to manage it effectively, and that is the key to do it effectively. Because in reality anyone can manage, but the vast majority over manage things, rather than giving them just the right amount of attention. But back to the primary point, the idea of taking tasks that you normally wouldn’t do because of their tedious low value nature and getting those completed for a very low cost can become extremely valuable for you. When you start to think out if you had more time to do those tasks hundreds of things probably come to mind, so I’ve encourage everyone to explore the low cost options for work/support and try to leverage them to your advantage

Why Qik Matters

Thu, 14 Aug 2008 17:15:34 -0700

Live video streaming from your phone might just seem like another form of lifecasting, a video form of twitter, or even a mobile version of ustream.com or justin.tv, but it really is far more than that. A few people have taken these mobile streaming services such as Qik, Flixwagon, and Kyte and really used them to their fullest capacity. Sure you can go to an extreme like Robert Scoble, but admittedly most of his content from Qik can be pretty good.

The real thing about mobile video streaming is something that some people have mentioned about twitter. Twitter often breaks the news, or has more information about breaking news than anywhere else. As much as anything thing else it allows for quick live footage of events. Whether its breaking news or an impromptu interview that someone happens to come across. Those willing to embrace this new form of potential reporting, and sacrifice expensive editing and high end video equipment will come away with more interesting pieces and find a rapidly growing following of consumers that tune in.

Mobile video streaming is far far bigger than ustream and justin.tv. It’s about more than showing the day to day happenings of your life whenever you want, and more about continuous access to news and happenings.

Don't Do It Yourself

Sat, 09 Aug 2008 16:08:32 -0700

So traditionally I’ve been a very do it yourself person. I wanted to be the person that didn’t have to rely on anyone, and thus far it’s worked pretty well. I can handle my day to day chores, as well as do my job, and most any side project of venture I take on I feel like I can accomplish pretty well. However, I recently asked my question if that was the best approach. While it’s good to be self reliant, the people at the top seldom do everything on their own. If I take notice of where I spend my time, most of it is on tasks that produce very very little value. Meanwhile there’s other tasks where I only have time to spend a few hours a week that produce much larger value.

On this point what I’m looking into is outsourcing many of these tasks. While it will take additional time to manage that process and delegate the tasks I believe I’m currently at a break even point where it makes sense. From there it can only go uphill and not down. I’ll likely post again in a few weeks after I’ve seen how this is progressed, but it seems those that are accomplishing a lot, in large part its because others do so much for them.

Fotoviewr

Fri, 08 Aug 2008 14:34:49 -0700

In the coming weeks I’m going to be working with a friend to help his venture in a new partnership that someone has approached him about. This partnership is rather large and I’m not at liberty to disclose details yet, and while this is indeed great news for the site, the great news for users is the site is already fully available.

In short fotoviewr is one half of an online photo album, it doesn’t store your photos for you, but allows you to take your existing photos from flickr or smugmug (with other support coming soon) and instantly put them into a visually appealing gallery. To me the nicest thing about it is the variety. I’ve used piclens before, and while I love piclens to scroll through 1000’s of pictures quickly, that seems to be the extent to which I use it. I can’t use piclens to show off my photo’s to friends or family. While browsing through a standard flickr page isn’t a bad experience, it isn’t a good one. Fotoviewr really does seem to deliver the other piece of what is missing from online photo sites.

Check out a few of the sample’s below, I’ve taken a few of my personal pictures from the past 4th of July and put them into a few of his views that are available. All-in-all this took me a fraction of the time it took to type this post, which is how more things should be online, simple.

How Ebay Missed the Boat

Fri, 08 Aug 2008 00:05:21 -0700

At a conversation today we got into a discussion about how ebay can compete with amazon. Which alone is enough content for an entire post, as they really aren’t playing the same game so not really competing. Instead I’d like to talk about where the conversation progressed to. To me the most interesting thing about ebay isn’t how they won the long tail, or how users are unhappy with the increasing costs placed on them. Instead its more at the level of where they really lost out.

You see, ebay had a devoted following back around 2000. Millions of people would visit their site, buy items, focus on their feedback rating, often spending time in the same category. While at the time most of this was indeed innovative, they stopped there. If they had only taken it a step futher and exposed the final piece of the puzzle by allowing users to interact with each other. If they had developed a network of users that could communicate with each other, that shared interests based on product categories, they could have easily been one of the largest early social networking sites out there. As facebook initially exploded as a social network for college students, and other more recent ones focusing on younger crowds, meanwhile you have linkedin to as a professional network. Ebay could have very much been the network for collectors, or anyone purchasing similar items. Not only would have this increased user engagement, it would have driven more sales.

Take for example if I’m a fan of a particular brand of jeans. I may be very proud of the brand I’ve found, but recently discovered some that were newer and more hip. Well I’m going to be a bit more hesitant to share that with my friends I hang out with as I want to be the one that’s hip versus everyone else having them as well. However if I have a similar community that I connect with online, I can share this information, build a repoire with them, and still maintain my step ahead of friends. It allows your users to become you product recommendation engine versus the Amazon approach of through a lot of machine learning at the problem.

And the most unfortunate part of it all, I’m not sure ebay even realizes they missed the boat.

How social networking advertising should work

Tue, 22 Jul 2008 18:10:12 -0700

Social advertising is a very different than traditional web advertising. The thing about social media advertisers, is users aren’t looking for a product. They’re already engaged in some activity, and don’t necessarily want to be drawn away from that. But that does not mean there isn’t value in it, it’s just a different form of value than search advertising.

When a user is searching from google, they are actually looking for something. They’re often looking for products, which is why advertising in its current form works great on search. It’s logical if a user is searching for shampoo, that proctor and gamble would want to pay to have their products show up on the right. This is vastly different from when I’m playing a racing game on facebook, and Ford shows me a commercial for their product, they’re not the same thing.

However there is still HUGE value in social advertising. Since you know so much more about a user you can target them even better, I can know, age, gender, music/movie preferenes, interests, hobbies, among many other things. Where this can help is branding, if there is a car game, and you can present a solid brand in places throughout that game I become brand loyal without realizing it. I don’t become disengaged, but I do take notice of it. It’s much like the branding that takes place in movies, or console games, with large negotiated contracts, however it needs to occur on a micro-level.

Microsoft vs. Apple

Wed, 16 Jul 2008 00:19:48 -0700

I find it extremely amusing that Microsoft and Apple are in many senses the very same company, at least in their actions, yet people feel very different about the two. For the average person they aren’t really a fan of Microsoft, and many love Apple. While I’m not really suggesting anyone should love Microsoft, why are people such Apple fanboys. Apple makes the same bad moves as Microsoft, they control their software and limit functionality in order to drive sales in the future.

For example with the iPhone, by disabling video streaming they are simply leaving something to be supported for next year. There’s now doubt that the phone is fully capable, especially with 3g, as qik is already supporting it. However they are having to jump through hoops to do it, when Apple could have simply enabled it in the SDK, and yet they didn’t. It’s unfortunate for AT&T in the process as well, because as people are die hard Apple fans, they feel Apple can do no wrong. This wasn’t quite the case in recent days, first with the launch of the iPhone 3g, there were many many bricked iPhones for that morning. Almost every complaint I saw on twitter drawed attention to AT&T screwing it up. However, from a source very close to the issue, the problem was entirely on Apple’s end, as they had tested for only a fraction of the traffic they got that day, and were not able to scale up new machines nearly fast enough.

Now Apple woes seem to continue as with MobileMe. For all the fan boys out there, and while I agree they make a good product, they should still be help to the same regard of anyone else that makes a product, and be complained to when they screw up. Apple has indeed done a great job with marketing and a reasonable job with products, however they keep a strong reign on applications, which is why I like the applications that are on OSX, but hate that its suchaÂ smaller number.

I’m not saying love Microsoft, or even hate Apple. But people judge them on their actions, and while they drive the boundaries, theres still no hard in calling them out when they hold back just for more revenue.

Facebook apps worth using

Sat, 12 Jul 2008 19:30:15 -0700

Facebook applications to check out,

Windows: Digsby - Facebook Im on your desktop Fonebook - sync outlook and facebook iDeskbook - Browse facebook on the desktop Photosaver - Friends photos as your screensaver

OSx: Friend Photos Screensaver - Friend’s photos as your screensaver Facebook exporter for iphoto Adium - Chat with facebook support Photobook - Miss your camera at an event, just steal your friends album EventSync - Sync event calendar with iCal

Web: Wordpress fotobook - Facebook albums inside your wordpress blog

Conversation aggregators vs. social network aggegators

Thu, 10 Jul 2008 18:44:02 -0700

I recently posted about web 2.5, and since that time have been diving into two sites that attempt to do this. The first is friendfeed, I’ve commented about it before. It’s overall a great site, however the community is still growing on it, and most of my personal friends are not on there, only those that I follow and interact with in a tech or professional community. And there’s the ability to go through and create an imaginary personality for friends, but for me that could take days, and while its still tempting I can’t quite commit that strongly. Yeah friend feed is great, but I find myself using it more for having a conversation with whoever is there, rather than using it to follow individual people.

With the emergence of rooms in friendfeed it seems they realize its more about being able to have a conversation around a similar topic than it is to track individual people.

However it seems that socialthing, which I recently got access to, thanks socialthing team, is a slightly better aggregator at least for my demographic. Friendfeed works well for those that use blogs, google reader, photo albums and the like. But friendfeed is seriously lacking on the facebook front, meanwhile socialthing is accomplishing this very well. While I’m not sure which one I’ll be engaged more in, in the coming weeks though I imagine it will depend on the purpose.

Friendfeed works for following information about tech, news, or similar broad topics. Socialthing works for keeping up with friends, when they’ve uploaded pictures from last friday night, or when that girl you have a crush on in high school breaks up with a long-term boyfriend and needs a rebound, and the like. I’m not sure how friendfeed would work if they did just enable the same time of features for facebook, I imagine it might not catch as right now its about conversations more than it is a singular feed. Socialthing has a chance to win this one, but you really need to have more than 10 services you connect to.

The problem with facebook's platform, is the problem isn't the platform

Tue, 08 Jul 2008 15:42:46 -0700

Facebook’s development platform just over a year ago seemed like a genius idea, with an almost infinite amount of potential. While it’s still a very hot topic, and most sites these days when they lauch attempt to have a facebook version of their site or service available at almost at the same time. However, I believe we are already over the peak of this, as more controls are being put in place to slow viral growth, and users are spending less time on the site and engaged in the applications.

My problem though is not with the slowing rate of engagement in applications developed on the facebook platform, but rather on what are the primary applications. Facebook seems to have done a very good job of keeping users to stay within the confines of the site, rather than simply using it as a utility. For most facebook is their personal planner for events, their personal datebook for friends/contacts, their online photo album, their email/messaging system, and more for some. And while it’s fine and dandy for some of these things, facebook is not the best endpoint to interact with when getting things to and from facebook.

Take for example the facebook chat. This is a great utility to be able to talk with friends that I may not have spoken with in years, my AIM list has around 200 users, meanwhile my facebook has over 500. No I do not wish to speak to all of these all the time, but in the rare occasion that I do it’s convenient. However IM chat within a browser just doesn’t do it for me, not on facebook, or meebo for that matter. The nice fact is that there is a solution and more being developed. Personally if I’m at home I use adium (Mac only) for my instant messaging which supports facebook. If I’m on a windows machine I use digsby to chat with my facebook friends and monitor what friends are doing. While digsby isn’t a perfect solution, I strongly prefer it to the other option of chatting within the browser.

What about pictures, that’s probably the single busiest activity outside of updating status. Here I know of multiple friends that have attempted to use the site’s interface for uploading pictures, only to have completed in double the time expected with much much more frustration than anticipated. Meanwhile, I simple select the photos I want to upload in iPhoto (there are options for mac and pc here), select export, click facebook, and off they go. This is the way it should be, I can do likewise for smugmug, flickr, etc.

Facebook has done a reasonable job at giving developers access to to facebook to allow them to build reasonable applications. While there’s a lot of junk out there, there is also some reasonable applications to really make facebook a reasonable utility. The problem lies that these seem to be hidden gems, whether its facebook or some third party, someone needs to start bringing these to the attention of others. Unless facebook transitions to themselves as strictly a utility and differentiates themselves on the quality of service the utility gives, and less on their UI and stronghold of data, they will be in for a world of hurt in a few years.

Web 2.5

Thu, 03 Jul 2008 19:03:41 -0700

I’ve talked about web 2.0, talked about web 3.0, but today realized theres still a middle ground we have to reach in between the two. It’s quite a pain that I really have no idea when my friends do certain things online. While some use facebook for absolutely everything, this is most certainly NOT the best option. Throwing your data into their walled garden is one thing, but for this to be the one and only place you store your online data is quite stupid. Facebook will only open up when they’re absolutely forced to, and may not even open up then. To migrate ’notes’ or rather blog posts out of facebook, or all of your pictures, or you’re messages can be an absolute pain. Why not use a service built for just those things, such as a wordpress blog, or flickr/picasa, or twitter/jaiku? Well most people don’t because of the simplicity of facebook being the central place for your data and your friend’s data.

Well there is a solution to it, though it’s not ideal yet, it will soon hit a tipping point of when it will be the solution. Well first I guess I should clearly layout the problem:

web 2.0 - the dynamic web emerged, users started publishing content …. Mass amounts of data, problems getting to it all …..

thus in the future we have… web 2.5 - content aggregation became nessecary, via friend feed

and eventually… web 3.0 - the semantic web, services understand you and your needs and provide content around context

In short this is a small plug for friendfeed, but if anyone else knows of a better service to in essence create a feed of you, please send them this way. I’ll be posting a full review on friendfeed soon, but for the time being just want to point out the value in such a service. Right now I post on multiple sites, I twitter, I blog, I use facebook, I use smugmug, I use picasa, I use last.fm, I use ilike, I use librarything, I use tumblr, I use google talk, among others. While personally I might be a little more invested than most, still the point remains that a lot of people are on more than one of these services. While I know them and follow them on the ones I know about, chances are I will never see their flickr accounts, or last.fm accounts. While some people worry about privacy and this being a stalker’s nightmare, I really don’t see it making things that much easier. Much less, most people are making a pretty big assumption assuming that they’re worth being stalked. I personally hope I could have a stalker come out of such, as it would give me a definifitive answer that someone actually reads and find me interesting. I just hope she’s 5'6", and a blonde bombshell, but then again I’ll take whatever stalkers I can get.

But the point remains that before we get to web 3.0 and the ability to deliver content based on context, we need to aggregate the content. Sites like friendfeed (and eventually socialthing) are a reasonable first step.

Check out my friendfeed at: www.friendfeed.com/craig081785

A Generation

Sun, 29 Jun 2008 18:51:47 -0700

I’ve written many posts here about business, technology, and the like. The reason I’ve been so delayed in updating, in addition to the busyness of life, is because this post has been brewing in my head for quite some time. I just haven’t been able to sit down and actually compose it until now for some reason.

So many of my posts have been about the web and how things will change in the future. Well while in this post it is still strongly related I want to talk a bit more about the social aspect. Of how I feel the next generation will insight change in much of the world because of the web. With the web and all of its utilities, youtube, aim, twitter, email, people no longer feel they’re separated by thousands of miles. Also its allowed any old joe to take on the form of publisher. I’ll concede that my generation watches more television than any generation prior, that sex and STD’s are at a higher rate than ever before. But in large part I believe the whole of the generation is following and simply consuming then information put out by the generation ahead of them.

BUT, there is a small group within the generation that cares about change. That small group is now able to gather mass following, to cause others to thing, and to actually make a difference. This generation seems to have done this through simply expressing themselves. To them its not about making money, or having a cult following, its about personal expression.

Here’s a few examples of what I mean about people expressing themselves:

wefeelfine.org

Kiva.org

The site that gets it right

Thu, 19 Jun 2008 17:17:55 -0700

This week I want to talk about one of the hands-down best sites on the internet. Mint.com, in case you haven’t heard about Mint yet, it’s like Quicken or Microsoft Money just online. You create an account on the website, login, add your account information. From there mint connects to each of your accounts, pulls down your transaction history, automatically categorizes your spending into categories, and then will send you alerts for budgets or other settings via text or email. Oh, and best of all since it knows where you’re spending your money, it tells you how you can save.

So since mint sounds great and wonderful, and it indeed is, I’m going to jump straight in and start addressing issues people may already have about this kind of site. The first is security, why would I give all of my account information away to a single place so someone could walk in and take every penny I have? Well first account information is hashed, it’s not just sitting in some text file on some desktop, its quite secure. Next, well mint gives you warnings right? So if someone goes and buys a car with you’re credit card you’ll get a text message about it. Now I may be missing something, but my bank has never offered me that kind of service. Oh and best of all, just because you put your account information, you still have the normal security backing of your bank and liability.

Worried about a company having so much information on you? What if I told you in a matter of minutes if I know you’re name I could likely have your past 3 residences, phone numbers, and other information. Or for that matter if you’re concerned with a company having that information, do you pay cash for everything. Because if you don’t the credit card companies have just as much information, and they and other companies often sell this information. Mint.com promises not to do such. So if your argument is that you don’t want people to know that much information about you, its a very valid one, but hypocritical if you don’t always use cash which, which in ways can still be traceable.

Finally I just want to highlight my favorite thing about mint.com. They get web 3.0, sure they have a rich interface, decent categorization, and good alerts. But best of all, they know where I spend my money, they know generic stuff about me, but because of that they can recommend to me ways I can actually save money. Now it may be just me, but I’m pretty sure everyone out there would like to save money. So its great to show ways that I truly can save money, not typical propaganda that is a waste of time for me.

Site Review: Friendfeed

Thu, 12 Jun 2008 23:37:30 -0700

There’s been a lot of buzz in the valley lately around this very small startup, that has a few pretty heavy hitters. Between the four founders they have worked on nearly all of the Google products so many know and love, with the exception of search. Paul Bucheit, is even responsible for Google’s current motto, “don’t be evil”. These four guys not only are visionaries within the web space, they also know how to deliver a product, having helped build and scale gmail and google maps is indeed a noteworthy accomplishment.

But what about their current task at hand, to be web 3.0 and reduce the noise of all of the web 2.0 tools out there. Well, first let me summarize what friendfeed does. When you sign up for friendfeed you add your web 2.0 accounts (currently supporting 35), some of note are: facebook, google talk, iLike, digg, twitter, flickr, picasa, youtube, yelp, and others. Friendfeed then creates a feed of you, so you can send the link to anyone and they can have a single source for updates to all of your web 2.0 interactions. Friendfeed does do a little more than that though, they attempt to filter out some of the noise by grouping your interactions together. For someone like Robert Scoble that on a given day could post 1000 tweets, you likely don’t want to see each one as a single line item. Friendfeed will group these and give you a short preview, then allow you to drill down.

All-in-all friendfeed is a reasonable service and will continue to be talked about in the valley for the coming year and then spread elsewhere in the world. However there are some problems with the service. First is the lag time, due to the restrictions of some of the services they connect to, sometimes your feed is twenty minutes behind your original posts/updates. Though this is no fault of their own, but nonetheless something users will not be excited over.

But more importantly friendfeed doesn’t have a concept of context. This would be my number one complaint that they’re not approaching web 3.0 yet. My most likely favorite site (well second to twitter), which will be reviewed next week, does a great job of understanding you and your context. When it recommends something it’s doing based on your history and it’s knowledge of you, and its often right. Indeed grouping messages together does have value, but until it can show me the messages I want to see and hide the ones I do not I won’t be amazed.

Whether or not you should be on it strictly depends on your involvement in web 2.0 sites. If you’re on more than 5 of the sites listed in their 35, it may be a worthwhile investment. While it won’t make the noise quiet, it will likely reduce it by 10-20%, which is better than nothing.

Other sites to watch out for (if they ever release): socialthing

For those interested, my friendfeed

iPhone 1.1

Wed, 11 Jun 2008 12:54:32 -0700

So this week they announced what many expected was coming the iPhone 3g. However off the shelf it’s still not webÂ 2.0, while a great device its not a web 2.0 device yet. Apple without a doubt understands user experience, but they do not fully grasp web 2.0 yet. Microsoft seems to even have a better understanding with the products they are looking to role out with Mesh and their enterprise social/collaboration tools. Lot’s of great applications were highlighted at the keynote, but only one of those talking about publishing content (with the exception of mobileme, which is a paid service). While there’s no doubt I will be getting the new iPhone when it is released in July, I will not talk about how it is a great web 2.0 device.

If the application store is as open as Apple alludes to it being, then I can see how it will quickly become a web 2.0 device. Loopt is likely the strongest contender for helping to build a location based social network, and when they release for the iPhone can turn it into a web 2.0 device. I’ll be most anxious to see how the push based services they announced will help to allow developers to turn it into a web 2.0 device as well. If I had to have my application constantly up it just doesn’t work out as a full enabler for web 2.0. BUT if you allow notifications to be regularly pushed it just simplifies and increases the regularity of community and people staying in touch.

The tipping point though for me at least will be if or when Apple finally allows video on the video. No, not playing video, but recording and streaming video. The kind of abilities available on a nokia n95, or available in my apple computer through iChat. When I can pull my phone out of my pocket and display to the world what I’m doing or where I’m at, you will have a device that allows you to communicate and most of all collaborate like any other before. It doesnt require more power than is already there, in fact I can record video on my jailbroken phone right now. The nokia n95 does a great job of streaming live video to qik, which is how I watched much of the Apple Keynote, it simply takes Apple finally understanding web 2.0 and embracing it.

How not to be successful in the valley

Tue, 10 Jun 2008 18:48:53 -0700

While I may or may not know how to be successful in silicon valley, I feel pretty confident that I can point out a few ways to not be successful in valley terms. What follows is my thoughts on how you can best limit yourself to own, run, or be involved in the entrepreneurial spirit of the valley.

The first is keeping yourself in a bubble, by not diving into the new technologies, new services, and new age of the web there’s little possibility you can be at the next steps of it. While I concede you don’t always have to explain it or understand it, you at least need to use it. The prestigious attitude of standing against something just for the sake of it won’t get you very far when people attempt to find you and communicate and can’t. There will be few individuals in the future similar to Jobs and Ellison that are a box of mysteries that no one has access to. Instead you will simply filter out noise that is relevant, but regardless your presence will be felt. It’s not only about making your life easier with useful tools like mint or dropbox and thinking about the next useful utility, but also about communicating and relating to others. Final thought … facebook be on it, twitter use it (don’t understand it, don’t explain it, just use it), friendfeed (jury’s out, but you better know what it is).

The second biggest thing you can do is to be patient. So many seem to sit around, wanting to have their own big thing, but are waiting for that one great idea to come to them. In most cases something probably does, the only problem is they’re not seasoned or practiced at building something. Yeah a few get lucky on the first try, but as a whole for those that are successful its due to persistence and not patience. Waiting for the right time in the market, the right time in your life, or just the right idea is wasting time you can’t get back. If you truly want to run something, start running something, and when the right idea does finally come along you’ll be prepared to build it up and run with it.

Third thing you can do is not to network. Yeah it’s easier than before to build a product and get people to adopt it because of the web, but that doesn’t mean you can do it on your own. If you want to have a great idea with a lot of potential go to waste, sit at home on a Friday night, work away alone and you will have no worries about having too much traffic or too many users. Most likely your idea will only appeal to you and miss various features and miss the needs of some of the users that would have been happy to tell you what they wanted.

Fourth, spend all your time networking. So you go to the events, meet the people, know people to fund you, have a great idea, and finally decide you’re going to actually start working on a product. The same night you sit down to code, you read of your product launching with someone else. With less funding, less knowledge, and less experience, all because they’ve actually been working on it. It’s a fine balance, but err on the side of not having every connection that you will need for a successful launch, and instead having a working demo or product to show to the connections that you do have.

The value in content as a commodity

Mon, 09 Jun 2008 19:22:55 -0700

There was a recent post over at ReadWriteWeb about how content is becoming a commodity. I don’t believe many people would argue with this. While at first this wasn’t something I viewed in a positive light the more I think over it the more I see some value in it. As content does become more and more of a commodity the value in who is publishing or producing that content goes up. Five years ago if someone would have talked about some trade-secret they learned about from Google many people would have perked up and listened. Today however rumors spread faster than ever and before you know it google calendars now predicts when your appointments are, puts them in your calendar, and sends you text messages with directions 30 minutes before each meeting. While this may very well happen some day, I’m pretty sure they won’t be rolling it out next week.

Now with so much content being produced and spread all over the place there is still some value in the quality of content (which there always has been). However, now there is much heavier focus on the source it came from. This comes to the principle of branding, and not necessarily like a wal-mart or coke brand, but a personal branding. My guess is that in the future, and by future I mean sooner than later (likely 2-3 years), personal brands as a whole will carry more weight than the companies they work for.

Take for example Digg, when Digg makes some announcement or stance a few people listen and it does get noticed. But this is only when the content or opinion falls within the Digg world. Meanwhile when Kevin Rose makes an announcement many more people than just in the realm of Digg notice. He’s a prime example of someone that has built a personal brand, he’s name carries weight, and more so than just that of Digg, Pownce, or Revision3 (even though they are nearly one in the same).

Some might view content becoming a commodity as a bit of a blow. I believe we will see more individuals emerge with an understanding of personal branding in part to help us sort through the content, but also to provide quality content in repeatable manners.

Nearsighted Business

Thu, 05 Jun 2008 18:26:27 -0700

Adobe’s former CEO, Bruce Chizen, when asked ‘What advice do you have for new/young public companies?’, gave a response of ‘Go private’. While partially a joke he went on to elaborate something that many businesses seem to miss on. The main idea is that businesses are very nearsighted in their focus, they look at quarterly goals and in some cases yearly, but not where they want to be in 10 years. When companies become worried that head count is high they simply freeze hiring across the board without thinking of its ramifications. Good people, well great people are truly hard to find, and when a company enforces a blanket hiring freeze they miss out on those few great people that they truly need to grow. Meanwhile when they decide they have bandwidth for 1000 new employees they open the flood gates and let the first 1000 that can spell their name correctly in, because they can. This short term focus in the long run greatly limits the ability of what a company can achieve.

In contrast a smaller company that is actually much more at risk of dying seems to have a better understanding of what their approach should be. While they may be understaffed and overworked, they firmly understand that they only have so many funds and therefore make wiser decisions when using them. The characteristics of the larger business and their short sighted focus leads to the cyclical performance that many experience over a few years, rather than a very steady growth they would like to maintain, and often leads to their end. A prime example would be IBM that laid off so many of their more talented people to lower their numbers years ago. As a result they lost their best workers and had many unexperienced individuals in there when they began hiring again. They are still working to catch back up to where they once were in the IT industry…

In part I wonder if being a private entity is really the only way to have the long term focus and not worry about quarterly earnings. Personally I have never been privy to this insight and may not be for some time, but I don’t foresee shareholders being patient enough to not punish a company if earnings are not so hot for a year or even a few quarters. So there may be nothing a large business can do about it, but I am surprised that more do not try.

Blurring the lines, the flip side

Wed, 04 Jun 2008 23:23:33 -0700

So yesterday I posted around blurring the lines around your personal and professional life. Today I’d like to discuss a bit of the opposite of when companies blur the lines. Hopefully you don’t, but at first you may think how is this possible. Well first let me highlight some of what many feel are negative pieces, such as when an HR Rep who went to the same university as you looks up your facebook profile. You thinking this is private to friends at school may not regularly monitor the content that goes up there. This specific case is a very interesting one, and for the moment I’d like to stay clear of it.

Instead I’d like to focus on some of the areas where businesses undoubtedly should improve, and some notable ones that are doing that very well. I first would like to take (as per my usual) an example from twitter. Twitter is many things for many people, but it is extremely common to find people on twitter ranting about a particular service. While I may complain all I want about Comcast to my friends, they have little ability to do much to counteract this. But when I do it on twitter they do, and Comcast is one example of a company that is taking a very pro-active approach to managing their online presence. I take an example of a few months ago, when internet had been out at Michael Arrington’s house, who is the founder of TechCrunch. He twittered about this outage, and had a personal phone call from an executive in Philadelphia a few hours later. Sure, he’s a noticeable guy so they responded to him. To me the attention getter was when someone called comcast’s bluff they they only follow the important people, and any old average joe doesn’t get a response. That particular average old joe did, saying they make the best effort they can to reach out to everyone (HR Block, Southwest are also doing this quite well).

The other is a little less about blurring the lines, but I still think a great example of a business stepping out of their traditional roles. This one was a story relayed to me by a co-worker. Zappos, which is an online shoe company primarily, has a great return policy. In any case, if you don’t like the shoe after you receive it, just return it, and they’ll even pay for the shipping. This specific incident a woman’s husband had recently passed away and she was attempting to return the shoes. She gave the reasoning behind it, and the customer service rep proceeded to go an extra step and send flowers to the funeral. While to some this might be offensive, or at the very least outside the bounds of a normal company, it shoes a company actually caring for people, which wasn’t their core business. While I won’t go into details of how well this turned out for the company, it’s a great example of a company blurring the lines.

In the future I anticipate we will see more and more of companies doing this, actually caring about what someone says/thinks/feels not only about the company, but personally as well. As we get to this it will become less about the buck and more about meeting needs.

Mixing personal and professional, Transparency

Wed, 04 Jun 2008 02:57:16 -0700

Someone recently commented on by tweeting after having given my two weeks notice to my company, that it may not have been the best idea. So as a result I’d like to post my reasoning and thought why I feel it is actually a great idea. Even a month before my giving two weeks notice many knew it was likely going to happen, and I personally hope that this allowed those that may have needed to, to better plan accordingly. This also I believe allows me to be more evaluated on my actual merit in some terms. You see if I made it clear that I might be looking at another opportunity, and was not wanted I believe there may have been good reason hints would have been given that I should look harder and more. Instead in was a reasonable bit of the opposite.

However the larger point of what I’d like to illustrate is that many of those that I work with I view as more than strictly co-workers. It’s not uncommon to meet up with them on a Saturday for drinks or to just hang out. Meanwhile I have many friends who I do not talk to on a daily or even weekly basis, yet still like to keep them informed of what I’m up to, just as I look to follow how they are. To me twitter and other social tools are this medium. While some may prefer to keep professional and personal strictly separate entities, I have no problem with the two blurring. In fact I believe it adds to what I bring to the table in both contexts. By no means do I live and breath work, but to me my work is not only a job either. There are times after 5 or 6 o’clock that I will spend reading about business and technology, in short because they interest me. Though there are times between 8 and 5, that I will have an IM conversation un-work related, or post tweets available to the rest of the world.

Hopefully some may value this transparency, though if you do not, dont add me on facebook, dont follow me on twitter. You’ll still get the formal good-bye email with the rest of the group, however, I wouldn’t expect you to get updates in 5 years, though that may be what you hope for. Personally I like to keep in touch with those that I enjoy working with, but also care to know about how they are doing with regards to their personal life. To me blurring the lines makes the relationships deeper and life more rewarding

Adobe came to play

Tue, 03 Jun 2008 00:30:10 -0700

I’m pretty sure I’ve said it before, but in case I haven’t voiced it on here yet, watch out for Adobe. They look like they’re firing on all cylinders lately and don’t look to be slowing down. At Adobe MAX 2007, I saw a preview of many things that were coming up for them. Though from already having worked with Adobe Flex was a fan, at least of the Flex and AIR products. Adobe now seems to be doing as much to drive adoption of the products as they expect from developers.

First came the acquisition of Buzzword, which is a Flex based word application. With additional abilities for users to collaborate, which in my opinion were already slightly superior to Google’s word equivalent. Then came Adobe Photoshop Express, this was in a sense a lighter version of Adobe Photoshop, only available online and best of all for free. While impressive in their own rights, they really just did a great job of showcasing the power of flex.

Now with acrobat.com adobe seems to have hit a home run. While it may only be a landing place for their in house web applications, it’s a landing place that points to a lot of great tools. Namely adobe connect, which at least matches if not passes every virtual meeting tool I’ve ever used. With well integrated white-boarding, screen sharing, file sharing, web cams, and dial-in support as well, it will surely be my choice for online meetings in the near future. Once the other tools such as buzzword and potentially other applications others such as google and microsoft will have to very much watch out.

Changing etiquette?

Mon, 02 Jun 2008 06:16:31 -0700

A recent conversation of someone that was offended when the were introduced to someone new, then was not greeted first since they were a female brought what follows to mind. The above is a train of thought that came from a 70 year old military wife. I do not believe this is common practice today and is quite rarely found as the common etiquette, but nonetheless I think what is proper etiquette in business is changing quite rapidly. Though I’m not sure if all of the older ideas and principles have gone away.

I take as a first example zuckerburg, whom is a notoriously difficult interview. Not because he keeps things hidden, or is sealed tight about the company, but rather that his soft skills are not his strength. His strength is building a web product that millions of people find worthwhile to divulge hours of their day into it.

Even two years ago when you were disgruntled with a company you may have gotten a few drinks in you and talked to a friend about your displeasure. But it certainly was not made fully public for anyone to see. At best you could only hope you were simply privy to things that would be brought to the publics eye from a larger misdoing either legally or that a mass-crowd found a problem with. But for simply being overworked, underpaid, or in some other odd way mistreated there was no politically correct outlet to speak through.

However in the past years it has become extremely common for those that are still employed, or were employed to voice their complaints and bring to light the details that were once hidden. I think of Zed Shaw’s rant on rails which calls out specific companies, or an older blog the diary of a mac genius, who gave detailed behind the scenes information of an apple customer support genius bar. While I’ll concede for the mass majority if it’s published it’s doesn’t mean its consumed, so it’s not a dramatic effect on any single business, I still find it hard to believe that this overall shift of users freely publishing is not going to be able to be stopped by companies. As we approach web 3.0 and have a better ability to pull in a larger base of information that’s more relevant this information may become more and more helpful to users.

Regardless of the effects, it seems the standard procedures for what is proper etiquette are changing. Whether its talking about your place of employment, or HR checking out you’re facebook profile to see if you’d be a risk for the company, the lines are being blurred from both sides and the barriers that once existed are now being torn down.

Is web 2.0 more utopian?

Thu, 29 May 2008 17:36:50 -0700

I recently read an article on Techcrunch on how web 2.0 had undoubtedly made an impact, but had yet to truly make money. From my stance they is really one way to make money on the web, which is through advertising (paid subscription services are dying). This can either be done through a simple banner ad, or something that can more easily be deemed a qualified lead or referral. Ads will always be there, but if web 2.0 is to start making money it must be on improving the measurement and throughput of qualified leads.

Without going into too much detail on referrals and qualified leads I’d like to mention a great example of this, mint.com. Mint offers a great free service of managing your personal finances. Throwing out security and sharing your data (another time another place), they do a great job of categorizing and monitoring your finances. In exchange they have access to your spending history. So you give them your spending history, in exchange that have hundreds of thousands of peoples data, such as what credit card you use, who your cable provider is. With this mint does what I believe is a good job of generating referrals, but telling you how much you can save by switching from Internet provider A at $60 per month to Internet provider B at $30 per month. They’re not just giving ads for the sake of it, they’re giving me something that I would actually want.

Does this lead to higher or lower revenue? We’ll this I’m not sure of, but with regards to advertising, feel this is a more better fit for a user. I only get things that the service provider thinks I want not what they think they can sell me. As we reach more of the semantic understanding of the web I believe this will prevail and make web 2.0 more profitable, but it cannot be done with web 2.0 alone.

Consuming versus Publishing

Thu, 22 May 2008 23:16:08 -0700

I’m finding it a growingly interesting balance in consuming versus producing information. I also feel with the increase in availability of information, the barrier to entry in so many areas are shrinking rapidly. I think of some of the areas where the barriers to entry are typically much higher. 15 years ago we would have never seen a 23 year old as the CEO of a company valued at 15 billion dollars. While my vantage point even within the business market is narrow, as I think in relation to technology I still believe it applies across the board. Now it no longer takes 20 years in business , learning from your experiences and mistakes, to have the knowledge and ability to run a large company. In fact, in some cases that may be a hinderance, because you will expect that your previous experiences have prepared you to handle the situation, which you may not view as different. The difference in today’s world is that businesses, the ones that will at least be around in 5, 10, and 20 years are the ones that manage to adapt to change extremely quickly.

But I digress, I’m starting to find it increasingly more difficult to determine what the balance is between creating and consuming content. I read on average 200 blog posts per day, while posting an average of two. I see and read an average of 500 tweets per day, while posting an average of 4. I listen to and watch an average of 3 podcasts per day, and publish none. Meanwhile I find myself having a variety of conversations about all the things I read, yet few of these are web communications, or at best over IM. While there should always be a learning curve in getting familiar with an industry space or domain expertise, and what point should it tip so you can become a publisher of content.

More specifically I’m curious what the current actual positions of those are that consume and those that publish both currently and from a long term perspective. You see because I consume content does it make me more likely to take and readily apply that. Or because I am polishing and publishing content does it mean when I interact with others it will be more polished, perhaps allowing me to sell better.

Really it’d be great to get some sort of focus group stats around this. . . Technorati Tags: content, twitter, blogs, publish, consume

Reduced noise in exchange for transparency

Wed, 21 May 2008 17:52:52 -0700

As I’ve become more or less a web 2.0 whore. I’ve also had a great interest in web 3.0 and what it will fortell. Most believe natural language and the semantic web will play a large role in that. And while it will that will not be the end result of web 3.0. While web 2.0 included AJAX and Flex, that really doesn’t fully encompass what they are. Web 3.0 to sum it up most simply will be about reducing the noise of the web. While I can take very little credit for this idea as I have heard others say the same or at least similar things, there is an interesting side that I believe most have not thought about.

You see, in order to reduce the noise of the web you have to know about me and what I consider noise. In order for someone to do this we have to be willing to give up information about ourselves, some of which people consider private. I still recall a conversation which I posted on a few days ago about users not wanting to give out their private information. I believe this attitude is very quickly becoming old hat, while there are individuals that will stay this way for several decades as a collective whole it’s a fleeting attitude. I think for example of mint.com which I willingly give all of my financial account information to in order for them to simplify my life. Instead of a massive collection of emails and notifications I get summarized views from them. While there still is the chance for noise as I could receive text messages about every transaction that happens, I have the ability now to filter that noise.

Noise is something that some people love, take scoble for example who loves having hundreds of twitter messages fly across his screen every few minutes. Though for the vast majority to reduce the noise to allow us to accomplish more in a day, but also have more time to enjoy it will be the key to the future of the web.

I’ve talked with some that believe that government policy will come after people start to become too open with their information. My perspective is that as long as there are safeguards around that information then there will be little barriers to users giving it away freely very soon. But in truth only time will tell at how well companies and products can reduce this noise and truly learn about a user, and if there will be regulation preventing such improvements.

Why Twitter matters to you

Sun, 18 May 2008 15:57:21 -0700

Products Company - For me this may the be most obvious and strongest reason for any of the four groups to begin using Twitter. It’s quite simple, people use your product or service, people talk about your product or service, people ARE talking about your product or service on Twitter. Regardless of whether you want it or not, people are going to talk and they’ll speak their minds. I don’t believe I know of a company that wouldn’t want to know what it’s consumers are saying about them, much less that wouldn’t want to be part of that conversation. But if they really do want to, they have the chance to at least be part of it over twitter if they so choose.

Enterprise - The enterprise has two main reasons why I feel twitter is of importance. Those were highlighted in a little more detail in a previous post, but to sum them up… Twitter is an asynchronous IM that you could rollout within an enterprise and have control over. You could own the messages, which allows for companies concerned about data security to feel more at ease. It is a convenient medium to convey status to a team and reduce meetings while increasing knowledge flow.

Additionally twitter starts to become a place for brainstorming and flushing out ideas. On twitter I follow people with similar interests in similar spaces. As I have a fraction of an idea, or a problem I encounter I post to twitter. Others throw in their opinion and a few hours or days later I have hundreds of thoughts around it, and a hopeful solution or flushed out thought.

Power user - This is an interesting group, because I feel web 3.0 will change in many ways how this group works. Few users like a lot of noise, few users like to see hundreds of messages fly across their screen each hour, and even fewer are truly productive with this. However, for the time being, there are users that want that noise, and that only seem to become more productive and thrive further with the more they have of it. For those users twitter is a gold mine, you can’t find a place where you can get much more unfiltered noise, with the exception of the web as a whole. While in its most raw form twitter is extremely noisy, there are a variety of methods to filter this, which is where twitter seems to become most useful to the rest of the population.

Consumer - If you haven’t fit yourself into any of the above categories then this will be where you fall. If you use facebook, email, or IM to communicate with friends, then twitter will be your friend in coming years. I remember a time where it seemed almost a set time at night I would see my buddy list on AIM start to grow as the usual friends sign on. I wouldn’t talk to all each day, but over the period of a week would talk to most on there, get updates on their lives and make distant plans to catch up at some point. As the world seems to spin our lives become more hectic and complicated with each passing day. Some nights I sit at my machine and see only a fraction of people that I’ve talked to in over a year on AIM. I debate whether to reach out to them to hear their latest and greatest news, or avoid some of the awkwardness. Though I genuinely do care and would like to know, it’s not always a simple and relaxed conversation to feel a part of their life.

Facebook has done a reasonable job of overcoming this, but facebook has hit the tipping point, where you now seem to know more people on there than you actually do. People spend more time, wasting time than actually reconnecting, and at the end of the day it’s originally goal has started to seemingly fade.

Twitter on the other hang allows me to have conversations with friends over a longer period of time. I don’t have to sit and wait for my friend to respond, he gets to it when he can. But for me it still maintains that direct communication path feel like AIM. I can also communicate from one to many people much easier, now instead of sending out an email to all my contacts I post a tweet, then all of my friends get the message. All-in-all what twitter really gives you is a way to still feel close and connected to those people. While a facebook status message may work just the same, it somehow seems to get lost amongst all the noise.

Web, Scalability, SAAS

Fri, 16 May 2008 19:46:45 -0700

At the time one startup that would be a prime example of what I’m about to detail did not come to mind. Salesforce, salesforce has a respectable feature list, yet is also one of the web companies that has managed to scale well. The interesting thing about web companies is the best ones are the ones that can truly scale to a mass audience. Many can offer an okay service, but scaling that service is a truly difficult task.

Many would argue that extensive testing and rigorous QA will help to offer enterprise quality software. But to this note I very strongly disagree, in coming years we are going to see more and more of a facebook model. Facebook as they develop code they test against live data, it will be the individuals that coded and perhaps a few others, but is really not much more than a smoke test to make sure things are still working. They then role out the new code and when things break they are ready to fix them. Why does this work? Why are users okay with the site being down at times? Well personally I will settle for more features at a few inconveniences most days of the week. Seldom is software 100% solid, I have even seen bugs in notepad, so as a result you cannot expect software to be perfect.

Instead with a SAAS, software as a service, model you do not have to worry about software updates. You simple update the site or service, and then you can role out these features more rapidly. Instead of office being a 2-3 year large product update, you push new features to the site at a weekly basis. You also alleviate some of your burden of supporting a variety of installations and versions of your product.

Because of these points it’s more important that we pay attention to these companies that have gotten scalability right. The facebook’s, google’s, amazon’s of the world that can handle millions of hits per second, once enterprises have a firm understanding of this, they can attempt to them build their feature set within a SAAS model. When we get to this point features will be built in faster and software will begin to grow faster than in previous years.

Bearish on mobile

Fri, 16 May 2008 01:05:00 -0700

I’ve heard that mobile devices will be everywhere and will replace computers for what seems like decades, but is probably growing on 6 or 7 years now. The thing is I don’t really need a desktop with me every where I go. Seldom am I somewhere and I wish I had access to Microsoft Office or wish I had Eclipse or Visual Studio. While I concede that I occasionally watch youtube on my IPhone, this is far more rare than it is common. Most of the time it’s simply for the wow factor that I pull it out and load up the “Here comes another bubble” video.

Meanwhile I will admit that I do love certain things about my mobile device. The ability to, within three button clicks, publish a photo to twitter or flickr. Or to quickly to live micro-blogging from an event, or (not I, but others) the ability to stream an interview from my phone and have others text questions to ask. But is this truly what I use on my desktop, yes I have IM and micro-blogging clients, but it’s far different. I seem to care more about existing services integrating to support SMS, or mobile applications, than about my mobile device being able to run whatever is on my desktop.

My primary point here is that I care less about supporting multimedia formats, applications, and power. And more about data and accessibility. Mobile devices are at a point now where I could locate friends, communicate with others, and have access to my small pieces of data I want. We simply need to have the backbone systems support them more.

Large corporations versus startups

Thu, 15 May 2008 12:32:00 -0700

After a year of living in Silicon Valley it’s hard not to be consumed with the excitement around one startup after the next. The hopes of being the next eBay, Google, or Facebook lie within many of individuals in the area. Some pursue it, some write it off as a distant hope, and still others just want to be a part of them at some point.

But why would someone just want to be a part of Google 5 years ago? The first thing that comes to mind is money, and while thats nice, I don’t believe its at the root of it. There’s a vastly different atmosphere at a startup. For one you have a say, right now if I left my company, I’d get good-byes from 100 people, be missed by maybe 20, and the other 159,900 would move along as if nothing happened. But if I left a startup the entire company notices, so there is a closer sense of belonging. Still I don’t believe this is the heart of it.

In part if you have the ability to make it at a startup you’re of a different breed. If a startup excites you, you’re of a different breed. You eat, sleep, breath what you do. In a 160,000 person company I don’t believe there’s any way 100% of the employees decided it was a dream job, or as a 10 year old, or even at 16 it was what they wanted to do with their life. But the kid that was developing at 12 his own first person shooter game, or at 16 was following the markets and getting a summer job to be able to invest, those are of a different breed. For those, it’s not a job, its a portion of who they are. You don’t do home and unwind, you go home and read more blogs, dive further into your area of expertise, and work on your own projects that allow you to bring more expertise into the workplace.

Perhaps this same attitude and drive applies in other industries, though I have the greatest insight into tech/web companies. There was a statement made by Paul Graham of Y Combinator at startup school recently. He said that investors are looking for the type of individuals that don’t need them, the type that are going to make it regardless of what everyone else says and who supports them. This is succinctly different than the corporate world, where you have to have validation all across the board to be able to move forward.

As I write this and contemplate more I believe it becomes even more of a chicken and egg problem. Startups require a certain type of individual, but those individuals are the ones best suited for startups.

Twitter will be commonplace in the enterprise

Wed, 14 May 2008 21:38:41 -0700

Twitter is great for a few key reasons, first let me highlight the two key uses I feel it has in the enterprise.

Adobe AIR is a game changer, if people would build for it...

Thu, 08 May 2008 16:46:00 -0700

Spend five minutes talking to me about technology or business and you’ll quickly realize I’m a fan of Adobe AIR. Adobe has done a very good job building a cross-platform runtime, and providing tools that make the transition from the web to the desktop quite minimal.

First to elaborate why I like AIR. For one I’m a fan of web 2.0, the sites feel cleaner, smoother, and drive new capabilities. Versus most desktop applications that are starting to feel old, and much like 1990’s Java Applets do on the web. With Flex Builder, which is an IDE for developing for flash and/or AIR, I can develop a sweet website, but then quickly port it to the desktop while maintaining a rich web 2.0 feel.

Second, did I mention it’s cross-platform? Windows, OSX, Linux, the application in AIR will look/feel/function the same. So what? This could have been done other ways right? Well here is where I start to place my bets, no one else has really done anything in this area as well as Adobe so far. Also Adobe has made their long term strategy clear, they want to truly become a cross-platform runtime. If you’re thinking what other platforms right now, you’re not thinking large enough. They want AIR to support mobile phones, set-top boxes, likely even gaming consoles. This means I can look at one application on multiple mediums with an either identical look, of very similar one, with minimal development efforts. This allows a developer to then focus even further on improving functionality.

Finally, a bit of a rant. Adobe AIR does not have a true competitor, Silverlight is a competitor to flash (not AIR), Google Gears might be the closest thing to it. But, with regard to gears taking a website and making it available offline isn’t all AIR can do. AIR has local file access, local access to some devices, which when you’re still within a browser you can’t do. Oh, and one feature I personally just like is the auto-update ability of applications.

So….. If it’s so great, why haven’t you heard of it and why aren’t you using it? Well the single problem seems to be people building applications on it. I’ve seen very few applications that would appeal to mainstream users. I (being an avid fan of AIR and anything web 2.0), will typically use 2 AIR applications per day. One being twhirl, which I use as my primary twitter client. The other tends to vary by needs. However this is contrasted with about 20 applications that I work in over a given day. Most AIR applications thus far are simple, one-off fun applications. Perhaps to really make some penetration there should be some of the following:

Why Google may not exist in 8-10 years

Thu, 08 May 2008 03:35:00 -0700

As I write this, I write only from my vague knowledge of where revenue’s come directly from. However, I do hope to back this up in the future with more numeric backing. My title of the post may be quite strong and negative, but I feel it has reasonable ground to stand on, based on one simple principle. A corporation should stick to and focus on it’s core business, and exhausting resources outside it’s core business could end up costing them their business.

As it is, Google’s primary revenue comes from advertising, they’re the primary source for advertising because of their search engine. While some may say their core business is data, I disagree, as how are they making money off of data. Data simply allows them to better target their ads. With Google’s attitude of “Do no wrong” they are unlikely to profit from the consolidation and sales of such data.

Meanwhile Google is exhausting their resources with little to show for it. Google over the past 5 years has been hiring some of the top talent within Silicon Valley. Paying good salaries for individuals that are supposed to be some of the best software engineers in the industry. I’m not suggesting that the individuals are not sharp, but what has Google truly produced from within it’s own walls? Google has bought many of it’s out lier products, i.e. Google Earth, Google Spreadsheets, Picasa, and others. First if they are to go into these areas they should strictly focus on acquisitions and have a smaller developer base.

Though from my perspective Google should not be investing in any of the three products mentioned above. While cool and arguably good products, how does Google plan on making money from these? If they are exhausting resources and effort into these products, at the expense of improving both search and ad’s they’re not focusing on their core business as they should be

Finally, if Google is not pouring into these two areas, search and ad’s, it only takes being beaten by one to be unseated. As soon as a company is able to do search better, or ad’s better Google will lose the majority, if not all of their revenue. While they may be able to stay around after someone else has become a dominant player, it will only be as a very small fraction of what they currently are.

I’m not predicting that Google will be gone in 8 years, but unless they really start to devote to their core business and quit wasting resources on un-needed areas they will have something to worry about.

Craig's Disney Food Guide

Mon, 01 Jan 0001 00:00:00 +0000

There’s an immense amount of things online about Disney, and especially so about Disney dining. We ourselves have frequented disney world a number of times over the years, and in time have made it to many of the restaurants there with only a few of the fancier ones left on our list. Here’s a few of our favorites, what to get, and why:

Food

Flying Fish
Le Cellier

Drinks

Craig's Disneyland Guide

Mon, 01 Jan 0001 00:00:00 +0000

Disney, and specifically disneyland, is a frequent vacation for us as it can be a convenient long weekend getaway. It doesn’t hurt that my wife is a huge disney fan. At the same time there’s certain things that can make it more of a relaxing trip than you may realize without screaming children. Rides are purely up to you, but I’ll dig into a few favorites for food/drinking.

Food

Carthay Circle - This is our most common visit in Disneyland. It’s a bit more fine dining, but not too overly high end as well. A few of the favorites are the biscuits, sriracha duck wings as appetizers. For entrees, mostly anything, though the pork chop is a highlight. Their wine list is also quite nice. And if you’re lucky enough to be on a corner table there’s a nice easter egg of the evil queen.
Flo’s V8 Cafe - Another common place we visit, though this is what Disney classifies as quick service. This is where it shines that Disney has really stepped up its food quality though. With quick service that includes fresh salads with turkey, fresh and quality sides, or delicious pork loin with BBQ sauce its much better than the microwaved burgers of years ago.
Napa Rose - On the other end of the spectrum from quick service is Disney’s much finer dining. This is an area that has always been quite nice, but has continued to be impressive. Only having the opportunity to dine here a couple times we can’t give a huge run down of the menu but will say the chef’s tasting and wine pairing is worth the experience.
Blue Bayou - Blue bayou is good food and especially filling. Though where it really wins is the ambiance. It exists within the same setting as the Pirates of the Carribean and feels exactly like you’re on a bayou at night in Lousiana. Being one of the most sought after places make sure to make reservations early for here.
Little Red Wagon - If you want something quick, but amazing this is the go to. True old fashioned style corn dogs, fresh dipped in batter for hot, greasy deliciousness. Lines can get a little long here, but it moves quickly so don’t fret too much.

Drinking

Carthay Circle Lounge - Carthay Circle is amazing for good, and their lounge is the spot for drinks, though their light dishes are quite nice as well. Their wine list is pretty good, though their cocktails are what you want here. Whether your preference is gin, whiskey, or champagne there’s a cocktail that will make all the screaming kids so much more bearable. They also have pretty good small bites here.
Cove Bar - On a warm day this is a great spot to sit and enjoy the pier area. They’ve got a few off menu drinks that make it both an adult but fun disney experience such as the black pearl (a long island with a twist). And if you’re hungry their lobster nachos are great.

Craig's Huntsville Guide

Mon, 01 Jan 0001 00:00:00 +0000

Food

Po Boy Factory - This is a regular stop anytime I’m back in Huntsville. No pomp and circumstance at all, just great New Orleans food that measures up to much of it actually in New Orleans.
Gibsons - A clear staple of the south is BBQ, Gibsons does it quite well. The pulled pork is great, though my favorite is probably hitting up this place for breakfast. A good country ham biscuit which is just not a thing in most other parts of the country. Though if there for lunch or dinner make sure to pick up a slice of lemon icebox.
Little Paul’s - Within the family of the Gibson’s, Little Paul is the little brother. Very similar overall to Gibson’s, but in particular their smoked turkey here is amazing. Make sure to use the white barbecue sauce with it.
Mezza Luna
Bob Baumhauer’s Wings

Drinking

What to Do

Craig's New Orlenas Guide

Mon, 01 Jan 0001 00:00:00 +0000

Drinking / Food

Three Muses - One of the best bars in the area. Great cocktails including orange blossom sazerac. Their food is quite great as well, amazing things had there – lobster spring rolls, crispy pork belly, fish tacos, edamame with toasted sesame oil and star anise
Jackson Brewery - As with many a pretty solid brewery with good food. The few isn’t quite traditional bar food, slants more to new orleans cuisine, but definitely have a few good items. The seared tuna salad is quite the helping of tuna and blackened alligator is delcious.
Pat Obriens - Well known for creating the hurricane. Their drinks are definitely sweet most and not the place for bargain drinks, but an evening in the courtyard or near the piano bar is a must do experience at least once if not many times in your lifetime.
Abita - Its a bit of a drive out of town, but a nice different experience. All of their beers are of course nice and fresh here and a good selection of quality bar food here as well.

Food Only

Crabby Jacks - The place for a Po Boy. Their King Po Boy is ~ $12 and probably has 2 pounds of perfectly fried shrimp on it. Feels like a New Orleans City Cafe
Cafe Du Monde - Sure its a given, but its still great. Beignets and Cafe Au Lait are a must anytime in new orleans.
Meltdown popsicles - In several days I walked by this place many times before even realizing it was there, but what a treat. Gourmet popsicles, i.e. strawberry basil, vietnamese coffee, pineapple cilantro
Angelo Brocato Gelato - As I right this I’m returning from Florence Italy. There’s not many places I could say even come close to the gelato I consumed this past week, this place though was truly great. Of course still not quite as good, but it does come close.

Shops

Fleur de Paris - Awesome true milinery. No photos allowed. Hats are $$$$ but dresses can be fun and affordable for an amazing quality.
Goorin Brothers - With something that has a little more for the men as well here’s another hat shop. This has a variety of very style-ish both mens and womens hats if you can find the occasion to wear them.
Caliche and Pao - For a city filled with local art and music this studio always jumps out to me. So much so I had to grab his well known set of light posts. The color and emotion within it captures the city so well.

Craig's Paso Robles Guide

Mon, 01 Jan 0001 00:00:00 +0000

Wineries

Lone Madrone - Lone madrone is one of my favorite wineries in that area and possibly period. In particular they focus a bit more on blends and being a little crazier, their winemaker Neil is also the winemaker at Tablas Creek where he makes a slightly more traditional Rhone style. Their new space makes it even better overall, with burgers and music often on weekends. They also do a variety of ciders, which are all very well done and can be a nice change of scenary to all the wine. Though don’t be mistaken despite great burgers delicious cider the wines are truly the highlight. A few personal favorites are The Dodd, The Will and the Tannat, though nearly all are delectable. Tasting here is $5, bottles range from $10 for cider $20 for wine up to $60ish.
Le Cuvier - Le Cuvier was a rare gem we immediately found in Paso. Their wines are a bit more on the lighter side over all with some acidity coming through, but at the same time very unique. Their pairing is all done with food and the food goes perfectly with the wine. If you like wines that are meant to be done with food or slightly lighter/acidic overall its well worth the visit. Tasting is complimentary wines range from $30 towards $70 with most in the $50 range.
Shale Oak - Shale oak is newer on the list for me, upon entering it had a much more elaborate feel to many others in Paso Robles. They have a strong focus on sustainability, which is expressed in a nice flair of their tasting menu having seeds made into it which can sprout flowers. Though for all the focus on sustainability its not lost on their wines. Straight down the menu was delicious with their white blend, cab (lighter overall compared to bolder napa ones), and petite sirah all standing out. Tasting is $5, bottles range from $20 to $40.
Derby - Derby is much more of a grower than wine maker, making wine from only about 10% of the grapes themselves, however what they do make is all equisite. Several were in a bit of a Rhone style and done quite well, overall lighter on the earth in many cases. Their white was a great surprise more interesting than most, as was their sparkling. Tasting is $5, bottles range from $15 to $60.
Clayhouse - Clayhouse was largely the reason we started heading down to Paso Robles. Looking for something different from the common Sonoma/Napa wines we were drawn to them strictly for their Petite Sirah which is an extremely valuable buy for as delicious as it is. They have several others that I’d put within the good every day drinking range, as well as some nicer ones that hold up well to other wineries such as their reserve Petite Sirah and Malbec. Another great convenience is they’re located on the square and open later to make it a great end of day stop once you shouldn’t be driving. Tasting is $5, bottles range from $10 to $50
Tablas Creek - With the same winemaker as Lone Madrone its not a surprise that many of these are great, it also probably doesn’t hurt that many of their grapes came from Chateau de Beaucastel. Tablas Creek focuses especially on Rhone varietals and dry farming as much as they can in similar fashion to many Rhone wineries. Commonly their rated well in wine reviews, and its clear why with certain ones. While all of theirs don’t stand out the same on the list, over half of them delight everytime I’m there. Tasting is $5, bottles range from $15 to $80
TurtleRock - Turtle rock was actually one of two labels by the wine maker. The turtle rock wines stood out a bit more, in particular their Rose was the highlight. A dry, but with great floral-ness and strawberry on the nose rose this was probably the favorite of many we’ve had down in Paso Robles.
Arroyo Robles - Arroyo robles is often hit or miss for us. Often times its been great, such times include when they had their sparkling, their almond sparkling (sweet, but could pair great with a weekend brunch), their grenache which has some awesome spice. Though sometimes when they have a shorter list while still good some its not quite the same. Their Rose is a fun one in that its more orange than pink. Tasting is $5, bottles range from $15 to $50
Assucion Ridge
Aaron

Food:

Berry Hill Bistro - Berry Hill is one of our regular stops when down in the area. Most commonly for lunch, a fairly casual bistro overall their sandwiches and salads are all delicious. Among delicious ones we’ve enjoyed were the Ahi Tuna, French Dip, and Swordfish steak, though nothing on the menu is likely to let you down.
Thomas Hill Organics -
Yanagi - A reasonable sushi place overall. There’s nothing in particular that stands out about it, their rolls are more creative than traditional. Fish has always been fresh.
Amsterdam Coffee House - This is a regular stop for us at least once often more when we visit. Their coffee is great, and the overall feel of the place is even better. Good choices in music, ample couch space, and great breakfast sandwiches make it an easy way to get going for the day.
We Olive - This was a bit of a surprise find as we were searching for a place to get some good aged balsamic vinegar. The result was leaving with that plus some great olive oil and banana caramel. They’ve got nearly everything in the store available for tasting, ranging from olive oils to marinades.

Craig's San Francisco Guide

Mon, 01 Jan 0001 00:00:00 +0000

Drinking

Smugglers Cove - great rum bar. Punch bowls lit on fire.
83 Proof - Best cocktails in the city, if you go try a basil gimlet if you like basil at all and have not had one before.
City Beer Store - if you go one place for beer it has to be here. Its a beer store that also has about 10 taps, they’ve had beers made for them in collaboration with breweries before.
The Trappist - Its in oakland, but convenient off of BART, would be second on my list likely of places to get beers at. I am to get here about once a month if possible, great beers of all varieties.
21st Amendment - I’m personally mixed on 21st amendment they do make some good beers and some people absolutely love the place. Probably the biggest up and coming brewery we have.
Anchor Brewing Company
Hangar One Distillery if you’re willing to take a ferry ride this is a great time, its about $10 to taste nearly 10 different liquors. They make hangar one vodka, lots of gins, some liquers (one with blue bottle coffee), and they were the first place to start making absinthe when it was legal again in the US.

Beer and Food:

La Trappe - Great beer and food place, mostly of the belgian varietal. The beer selection is a good list, then they have a full binder of whats in bottles that you can order. If you go here the fries are a must and the belgian mayo goes great with them (the have something like 16 different sauces for the fries).
Suppenkuche - german food and boots, theres also the biergarten next door which if weather is nice its a great place to stop.
Monks Kettle - great beers, have only have the food once, was good but pricey for quality. Place is small and can get crowded.
Rogue Public Alehouse
Toronado - food is from a sausages place thats excellent - this is definitely a very affordable and great food option with enough food to fill you up for < $10 often
Pyramid Brewery - Over by us in east bay a bart ride away but doable, standard pub food, but done pretty well.
Elevation 66 - Over by us as well, a super small microbrewery, they have great food as well.

Food

OSHA - great thai food
Stone Korean Kitchen - korean, the kimchi friend rice is great
Zero Zero - pizza, thin crust style but San Francisco as well as in a bit foodie
Colosseo - Italian place in north beach have eaten at a few times, quite a few people ate here when out at the wedding and loved it
Sotto Mare - Smaller italian restaurant, more seafood. Feels like your italian grandmother serving you, when we had to wait 5 minutes they were immediately pouring us wine on the house and chatty/friendly as ever. (small)
Fish - best seafood I’ve ever had, a laid back place but can be pricier with entrees anywhere from ~ $20 and up
Walzwerk - tiny little east german place with well good german beers. Not super pricey, but many entrees ~ $15 range and up
SOMA Street Food Park - A food truck area that has about 10 trucks for lunch and 10 for dinner. Lots of variety here, heaters, tv’s, wifi, picnic benches and beer. Its a very SF take on food trucks.
Dynamo Donuts - Must try the bacon maple apple donut.
Kiji - California fusion sushi, really good sushi with some fun twists.

Activities

Craig's Sonoma and Napa Wine Guide

Mon, 01 Jan 0001 00:00:00 +0000

As a frequent visitor to Sonoma/Napa area and wine drinker I often guide friends and visitors when planning their wine country trip. Even when not playing tour guide I’ll give advice to those looking for a wine tasting experience. To simplify this I’ve written up some notes on many of the places I’ve visited on several occasions and have a a set of recommended agenda over here based on your preferences.

Imagery Winery

Website | Yelp

Large variety, great to expose those newer to wines to some approachable ones as well as having something for the experienced wine drinker. They’re often a bit more creative with their wines, sometimes doing grapes that are commonly blended by themselves and other times doing creative blends such as their code blue which is 80% Syrah and 20% blueberry wine (a wine that smells blueberry, tastes like a solid Syrah and finishes like blueberry, but not sweet at any point).

Tasting ranges from $10-$20
Bottles range from $20-$80

Benziger Winery

Benziger, sister winery to Imagery, follows a more traditional approach to their wines. Here you’ll find great quality Chardonnay, Pinot, Bordeaux Blends and Cabs. They have a large emphasis on bio-dynamic wines (the next step up from organic) and are a great way to experience a variety of wines made in a traditional form. The grounds are also gorgeous providing a great place to picnic or enjoy a brick over pizza from them.

Tasting ranges from $10-$20
Bottles range from $20-$80

Kaz

The smallest winery in Sonoma valley, it embodies much of what Sonoma is compared to Napa. The owner and wine maker is often behind the bar guiding you through the menu and he alone is a great experience. As for the wines themselves, their reds are generally good but nothing to write home about typically in the category of a good table wine. However they do have a great selection of Port in particular a white, blush and red port, with the white port most recently tasting like Hazelnut.

Tasting $5
Bottles range from $15-$40

Gundlasch Bundshu

Gundlasch Bundshu is the oldest winery in Sonoma. They make many traditional wines including Chardonnay, Merlot, Pinot Noir. They also have a few that are done a less common way such as their Gewurtzaminer done in an off-dry fashion which is enjoyable for those that like sweet wines but also approachable for those that do not. Its a bit more nestled away than some of the other Sonoma wineries and provides a good experience of being hidden away from it all.

Tasting $10
Bottles range from $20-$80

Sojourn Cellars

Sojourn is more of the Napa experience within Sonoma, its a private tasting by reservation only. They’ll take you through their current releases of Pinot and Cab typically 3 of each. The tasting room is just off of the Sonoma square and feels as if you’re sitting around someone’s dining room table. For Pinot or Cab lovers both are great here.

Tasting complimentary
Bottles range from $40 and $70

Cline

Cline is one of the larger producers within Sonoma and is something you can generally find at local markets. The wines are generally approachable to those newer to wine, though there are also some for the more experienced as well. In particular they have a large variety of Zins ranging from fruity to spicy. They also have a variety of birds and animals on the property around their pond making it a good place to entertain kids for a bit.

Tasting from $0 to $5
Bottles range from $15 to $40

Jacuzzi

Jacuzzi is the sister winery to Cline with Cline focusing on more French style Jacuzzi focuses on more Italian style. In addition to the traditional Sangiovese which is the primary wine in Chianti you’ll find Italian wines that are less common in California including Dolcetto, Nebbiolo, and Sagrantino. If you’re a fan of Italian wines this can be a good stop and with a large variety its likely to include something for people newer to wine as well. Also of note is they typically have a large selection of olive oils to taste for the non-drinkers in the party.

Tasting from $0 to $5
Bottles range from $15 to $40

Viansa

Falling somewhere between Jacuzzi with a strong selection of Italian wines and other more traditional California wineries with selections such as Cab, Chardonnay, and Zin this winery can have something for most. It also offers great views looking out over part of Sonoma valley. If stopping here try to do it earlier in the day as being one of the first places you encounter its likely to be crowded as the day progresses.

Tasting from $10
Bottles range from $20 to $60

Gloria Ferrer

One of the few places with a focus on champagne in Sonoma it also offers great views. While many of their champagnes you can find in local markets, the ones sold here are far less available. Their champages are sold by the glass, not the traditional wine tasting method of a sampling of several, this makes the experience a bit easier to enjoy the view from their patio looking out over part of Sonoma Valley.

Glasses from $5-$15
Bottles from $15-$60

Enkidu

Enkidu is a hidden spot in much of Sonoma as its a smaller tasting room that doesn’t draw much attention to itself. The people behind the bar will be 1 of a small handful that are very involved with the winery. Their wines can be enjoyed by many, but for more experienced wine drinkers that enjoy more body they have a great selection of Petite Syrah.

Tasting $10
Bottles from $20-$50

St. Francis

St. Francis has a more common selection of California wines including a special focus Zin’s and Cab’s, offering a great quality of both. In addition to a standard tasting they have some options to do food and wine pairings as well to allow for a bit more of a special experience. The winery itself is reminiscent of a European Castle to a much smaller scale, a winery that while a bit isolated by itself feels like its been pulled right out of Napa.

Tasting $10-$30
Bottles $20-$80

Chateau St. Jean

Chateau St. Jean is also a more traditional winery though they offer a long list and broad selection. As a winery where you can find many of theirs in stores they go well beyond that at the Winery. With a large grounds area its a great spot for photos and to relax for a bit.

Tasting $5-$20
Bottles $20-$80

Kenwood

Kenwood winery also commonly found in stores has a much broader selection as well. While most of theirs are distributed at stores with about 30 wines on their tasting menu it offers a great chance to try a large variety of theirs. In particular their Reserve and Jack London series offer a variety of Cabs and Merlots at a price much more approachable than many Napa wineries.

Tasting $5
Bottles $10-$60

Mayo Family

A smaller family winery with its tasting room separate from the vineyard its a nice spot for a variety of wines. Its one of a few places serving champagne and also has a couple of choices for dessert wines for those that prefer it.

Tasting $5
Bottles $15-$50

Audelessa

Audelessa offers a more limited list typically only pouring 5 wines on a given day, though those 5 can be be well worth the trip. For Pinot or Cab lovers this is a great stop though they do venture beyond those as well. Additionally if you’re tired from standing at the counter all day they have some comfy seats assuming they’re not all full.

Tasting $10
Bottles $25-$60

Highway 12

Located right on the downtown square and sharing space with a store this one thats easy to miss, though its a shame to have that happen. They have a range of wines but focus mostly on their Cab and Bordeaux blends. They have a large range from approachable table wines up to their reserve Bordeaux. In particular their regular Bordeaux blend and Cab are an amazing bang for the buck that make you not feel bad opening one on a weeknight.

Tasting complimentary to $5
Bottles $10 to $80

Adobe Road

Located on the square with a range from beginner to some big cabs it has something thats pretty approachable for everyone. They can often get crowded so getting their earlier is advisable.

Tasting $10 to $20
Bottles $10 to $80

Hawkes

Just off the square this is slightly more hidden than some, but more discoverable than others. With a short list at a given point in time they have a strong focus on Cabs and is a great spot if you want to enjoy 2-3 within a tasting.

Tasting $10
Bottles $20-$80

Conn Creek

Hidden off the main road of Napa this place has a more approachable feel than most Napa wineries. Despite not having an ornate entrance they deliver the same quality Cabs that so many other Napa wineries do at a fraction of the price. If you’re in Napa strictly for the quality of wines and want a place where you can relax this is a great stop.

Tasting $5-$10
Bottles $20-$60

Flora Springs

Flora Springs is a pretty typical Napa experience with high ceilings and beautifully decorated their wines match what you’d expect. With a great selection of Cabs and other quality wines its a place to come if you want the stereotypical Napa experience. You’ll hear a bit more about the region and vineyard than the taste of the wine, but it doesn’t mean the wines are lacking.

Tasting $10-$20
Bottles $25-$100

Frogs Leap

Frogs Leaps is a pretty common Napa winery, though a bit more laid back than many. With a variety of great wines they’ll rotate through what they’re pouring pretty frequently. This can range from Rose to Zin to a common Napa Cab. Most wines are great quality and you have the ability to enjoy them in a variety of ways from at a sit down tasting to outdoors while playing Cornhole.

Tasting $10-$30
Bottles $20-$100

Pina

Pina is what you expect in quality when it comes to great Napa cabs, but without so much of the pomp and circumstance. You taste in what is essentially a large warehouse that has a bit of a Kaz feel to it. The people behind the counter are usually older and simply there to pour and let you enjoy. Colors of their cab go from dark ink, to dark ink died with dark ink. Selection is mostly cabs, though they have a Chardonnay and late harvest as well.

Tasting ~ $15
Bottles $50 - $100

VJB

For sometime I’d passed VJB up as it seemed very touristy, though sad it took so long to visit. They have several italian varietals, but of the ones on my list doing that style very possibly the best. They don’t have anything quite in the range of Super Tuscan or Brunellos, more on the softer side of Italian wines but they do them very well. Also a great area for ordering a pizza or from their cafe and enjoying the area

Tasting ~ $10
Bottles $20 - $50

Iron Horse

Website - Yelp

Among Gloria Ferrer, Domaine Carneros, and Chandon Iron Horse is hands down my favorite. They do both still and sparkling wines, though I can only comment about their sparkling. In getting there you feel as if you’re entirely off the beaten path. The tasting area is outside and overlooks the valley with gorgeous views. Their champagnes are great quality and many that may be harder to find in stores, including one commissioned especially by Disney for distribution in their parks which is quite great.

Tasting ~ $10
Bottles $30 - $80

Morlet

Website | Yelp

This is the most impressive wine experience I’ve ever had, and at the same time an extremely relaxed one. Their tasting is by appointment only and you should plan a few weeks out. The wine maker is a 4th generation wine maker and grower from France. He’s the wine maker for several other great quality wineries in the area as well. Easily some of the best wine I’ve tasted in my life, though it also comes with a price to match. Some wines of particular note include his White Bordeaux, his Syrah which smells just like fresh craked black pepper, and every single one of his cabs. A final note, the wine tasting is always conducted by him or his wife, both of whom are a wonderful experience and very different from each other.

Tasting ~ $100
Bottles $65-$300

Salvestrin

Website | Yelp

This was an impromptu visit for us, they generally encourage a reservation though it seems easy enough to just stop in. Their environment was very relaxed as nice break from so much of Napa. Their lower end wines were definitely the ones that seemed to excel. The higher end into cab and estate wines were of good quality but didn’t quite hold up to some of the outstanding others in the Napa area. The great thing was their lower end ones stood out as great compared to so many surrounding wineries.

Tasting ~ $20
Bottles $25-$100

Lasseter

Website | Yelp

If you love anything about Disney or Pixar whats not to love about a winery by John Lasseter himself. In truth the winery is actually mostly a labor of love by his wife, or as he describes it “her movie”. Appointment only here definitely holds true, but as with most places that require an appointment it’s worth the coordination. Their tasting experience includes a quick walk through the grounds and barrel room, followed by a great tasting experience paired with cheeses and chocolates. They focus exclusively on Rhone style blends, all of which are quite excellent.

Tasting ~ $25
Bottles $25-$60

Craig's Wine Tasting Routes

Mon, 01 Jan 0001 00:00:00 +0000

For those interested in tasting in Wine Country, but don’t want to read through the full list and determine you’re own agenda here’s a few pre-set ones that can help you:

Personal Favorites

Imagery
Enkidu
Benziger
Sojourn

By Taste

Wine Beginner

Cline
Jacuzzi
Gloria Ferrer
Imagery

Zin

Cline
Kenwood
St. Francis

Cabs/Bordeaux

Benziger
Hawkes
Highway 12
Sojourn

Petite Sirah

Imagery
Enkidu

Sweet Wines

Kaz (Port)
Imagery (Port and Moscato)
Mayo Family (Port and Late Harvest)
Cline (Late Harvest)

Champagne

Cline
Jacuzzi
Gloria Ferrer
Kenwood

Location

Glen Ellen

Mayo Family
Benziger
Audulessa
Imagery

Highway 12

Imagery
Enkidu
Kenwood
Chateau St. Jean
Kaz

Sonoma Downtown

Highway 12
Hawkes
Sojourn
Gundlasch Bundshu

Napa

Frogs Leap
Conn Creek
Flora Springs

Past top content

Mon, 01 Jan 0001 00:00:00 +0000

Speaking

Mon, 01 Jan 0001 00:00:00 +0000

I frequently give talks at various Tech conferences and meetups. I’ve given talks about Postgres/Databases, Python/Django, and cultural talks around effective engineering and product teams. Currently I’m interested on speaking related to Postgres, product management, and marketing for developer focused companies. If you’re interested in having me speak at a meetup or upcoming conference please reach out to me at craig.kerstiens at gmail.com.

Upcoming Talks

Past Talks

2013-04-02 - Great Wide Open

2013-04-04 - Ancient City Ruby

2013-04-15 - PyCon

2014-02-23 - PyTennessee - Going beyond limits of Django ORM with Postgres

2014-02-01 - FOSDEM - Postgres Performance for Humans

2013-11-13 - Dreamforce

2013-10-29 - PgConf EU

2013-09-17 - Postgres Open

2013-07-03 - EuroPython

2013-05-15 - DjangoCon EU

2013-04-04 - Mountain West Ruby Conf - Postgres Demystified

2013-03-18 - PyCon US - Going beyond the Django ORM limitations with Postgres

2013-02-02 - Fosdem - Postgres Demystified

2013-01-31 - Monkigras - Coffee as Collaboration

2012-11-23 - All Your Base Conf - Postgres Demystified Video

2012-11-15 - PyCon Argentina - Django Apps to Services

2012-11-13 - PgDay Argentina - Postgres Demystified

2012-07-16 - OSCON - Django Apps to Services Video

2012-06-28 - CloudEast - 12 Factor App

2012-06-06 - DjangoCon EU - How Heroku Uses Heroku to Build Heroku

2012-04-15 - DjangoCong - Django Apps to Services

Tahoe Donner

Mon, 01 Jan 0001 00:00:00 +0000

Wifi: ToC

Password: RollTide

The wifi is an Eero setup near the TV with an extender plugged in upstairs in the loft.

Upon Arrival:

In winter

Water will be off. In the mudroom in the wooden box is the water. You may be able to turn with hands only, though may need to use the wrench (laying right there on top) on the lower portion for extra leverage.

Before departing:

Run dishes on rinse cycle - no need to run for a full cycle as we know coordinating turning the water off can be a pain
Strip beds and leave dirty sheets in the laundry basket
Lock back door (with foot lock)
Lock front door
Set thermostat to 55
Open cabinets below the sinks
Turn on sink to drain pipes
Turn off water

Water:

Main water shutdown is located in the wooden bench inside the mudroom.

If you have trouble turning use the wrench on the lower half for extra leverage.

Fireplace:

Travel/Wine

Mon, 01 Jan 0001 00:00:00 +0000

Among my hobbies include food/wine/travel. Yes, wine is a hobby. As I know frequently enough get asked for recommendations on wine or when traveling I’ve started to condense this into various simple lists.

City Guides

Huntsville, AL - The place I still consider home, Rocket City, because well literally its where the rockets happen. Aside from the obvious space and rocket center there’s plenty of great southern food and times to be had.

New Orleans - Theres possibly no greater city in the world for the trio of food, alcohol, and best of all music.

Paso Robles - A lesser visited wine region, though thats gradually changing with it winning wine region of the year in 2013. Much more relaxed than Napa or even Sonoma, yet great wine.

San Francisco - Hopefully not much needs to be said about the city by the bay. A foodie city with lots of activities, and a woefully out of date list because the pace at which things change around here is exhausting.

Disneyland - As my wife is a large Disney fan and now with a toddler, we frequent Disney. As we frequent it we tend to hit up a few of the extra things such as food and alcohol of course, here’s some of our faves.

Wine Guides

Wine reviews - Reviews of various wineries I’ve attended. Each winery has been attended more than once to have some comparisson between visits. Most are in the Sonoma region, though some outside.

Wine routes - For many reading through lists and lists of reviews and creating your own trip to wine country can be tiring. Here’s a few pre-canned recommendations based on taste preference or location to make a visit to wine country easier.

My Memberships - Occasionally I’m asked where I’m a member at and where I frequent.

Wine Club Memberships

Mon, 01 Jan 0001 00:00:00 +0000

Sonoma/Napa

Benziger Winery - Good value, standard California varietals/blends
Sojourn Cellars - Great pinots and cabs
Iron Horse - Champagne, and great Pinot
Piña - Grower in Napa, make great Napa cab at a better price point than much of Napa ($85 a bottle typically)

Paso Robles

Lone Madrone - Really fun set of blends here
Tablas Creek - Classic Rhone style wines, good price point