Dynamic tags applied like health checks. #1048

fidian · 2015-06-19T15:43:13Z

In issue #867 I suggested an idea to make tags that depend on the result of scripts, just like health checks.

I run mongo in the cloud with multiple machines all spun up from the same image. On boot they will query for mongodb.service.consul and join the cluster. That all works flawlessly. In being a good Ops person I have a cron job that will kill random machines in my infrastructure at random times. It will eventually hit the mongodb master, the system will hiccup and a slave will be promoted automatically. Life is fantastic.

In comes Legacy Software that must connect directly to the master mongodb instance. I would like to have master.mongodb.service.consul resolve to the one IP of the master in the cluster.

Current solution (runs via cron on all machines):

Get my service definition through API
Check the status of the cluster. This determines if we should or should not have a tag.
Determine if the service definition's tag list needs to be updated.
If an update is required, POST data back to the API.

Ideal solution:

Set up my service definition with dynamic tags.
Write a script that returns the status of the cluster, with an exit code of 0 meaning to apply the tag.
Let consul update itself automatically.

Sample JSON (one static tag, one dynamic tag):

{
    "service": {
        "name": "mongodb",
        "tags": [
            "fault-tolerant",
            {
                "name": "master",
                "script": "/usr/local/bin/mongo-is-master.sh",
                "interval": "10s"
            }
        ],
        "address": "127.0.0.1",
        "port": 8000,
        "checks": [
            {
                "script": "/usr/local/bin/mongo-health-check.sh",
                "interval": "10s"
            }
        ]
    }
}

This sort of solution could apply to issue #155 and #867, and possibly other.

The text was updated successfully, but these errors were encountered:

ryanuber · 2015-06-22T17:51:17Z

Interesting idea. I think the work-around you mentioned is a decent way of doing this, but I'm going to leave this open as a thought ticket for now. Thanks!

Kosta-Github · 2015-06-22T19:20:54Z

@fidian with respect to your statement: "On boot they will query for mongodb.service.consul and join the cluster."

Can you describe this a bit more, since I want to setup something similar for a redis cluster. Do you use some handcrafted script (e.g., via consul-tenplate or the REST API) for querying for mongodb.service.consul to get all registered nodes for that service or are you relying on the DNS mechanism for that? At least one problem with solely relying on the DNS mechanism is, that if the node registeres itself (e.g., with registrator) within the consul cluster before it does the DNS lookup for mongodb.service.consul it might get back its own IP address, which would not be helpful to join the cluster... :-)

walrusVision · 2015-06-23T04:07:20Z

This would useful for services like zookeeper which dynamically elects a leader node among themselves every time a node joins or leaves the cluster and the leader has the setting on so that it no longer accepts client connections. Having dynamic tags like this via check would make so I could query consul for the non-leader nodes and not have a client trying to connect to the leader at all.

fidian · 2015-06-23T15:05:21Z

@Kosta-Github asked how I manage to auto cluster my mongo instances.

Consul is hooked up through dnsmasq.
Consul is started before mongo.
The health check fails unless mongo reports success and mongo is part of a cluster. This second part is vital - the health check fails until mongo is in a cluster.
The init script for mongo queries DNS for other members in the cluster. This will only report mongo instances that are already in a replica set.

If IPs are found, become a slave and connect to the IP that we found.
With no IPs, configure as a master and enable the replica set, which then makes the health check pass.

The only snag is that I must start one instance of mongo initially so it will bootstrap the replica set. Once it is running I am able to add and remove instances to my replica set.

Kosta-Github · 2015-06-23T16:19:04Z

@fidian thanks for the explanation; just one more question: how does your dnsmasq config look like? :-)

fidian · 2015-06-23T16:30:41Z

@Kosta-Github it looks like the following. I'd also answer questions off this issue. Feel free to email me directly at [email protected] so we don't continue to pollute this thread.

server=/consul./127.0.0.1#8600

igoratencompass · 2015-07-13T01:37:20Z

+1 for this feature request

eloycoto · 2015-07-20T15:53:00Z

+1

hugochinchilla · 2015-07-27T09:56:27Z

+1

xakraz · 2015-08-20T22:56:38Z

+1

jh409 · 2015-09-24T11:01:25Z

+1

adbourne · 2015-10-08T15:57:36Z

+1

memelet · 2015-11-28T14:02:42Z

This would be very very nice. There are all kinds of things for which clients need to connect the mast expliclty. A dynamic tag would be so elegant. So much better then a bunch of add scripts to tweak tags.

danielbenzvi · 2015-12-07T17:48:57Z

+1

wyhysj · 2015-12-08T08:36:02Z

+1

123BLiN · 2015-12-14T16:00:03Z

+1 tag plus script would be very usful to implement custom DNS response logic

richard-hulm · 2015-12-17T15:52:05Z

Currently have to run two 'services' for a similar situation, have a "redis" service which includes all nodes in the cluster, then a "redis-master" service

This has the unfortunate side-effect of meaning most of the redis nodes are always 'failing' the health check because theyre not the master..

Would definitely appreciate this feature as a way around this

slackpad · 2015-12-19T06:48:42Z

Consul 0.6 added a "tag override" feature that's useful for implementing schemes like this, though the logic is run outside of Consul, not from Consul itself as suggested here. Here's the issue that brought it in #1102.

Here's a bit of the documentation, from https://www.consul.io/docs/agent/services.html:

The enableTagOverride can optionally be specified to disable the anti-entropy feature for this service. If enableTagOverride is set to TRUE then external agents can update this service in the catalog and modify the tags. Subsequent local sync operations by this agent will ignore the updated tags.

This would let an external agent like a script working with redis-sentinel to apply the tags to the current master via Consul's catalog API.

mvanderlee · 2016-04-14T18:47:17Z

+1 Would love to see this instead of the workaround with tag overriding.

jcua · 2016-04-19T16:28:23Z

+1

PedroAlvarado · 2016-04-19T20:30:52Z

+1

onnimonni · 2016-05-11T13:36:56Z

This is brilliant idea :), I would also want this for redis cluster!

nickwales · 2016-05-17T22:06:46Z

+1 This would give us the ability to determine which application version should receive LB traffic in marathon.

rafaelcapucho · 2016-05-18T06:59:12Z

+1

tomwganem · 2016-05-19T21:53:59Z

+1

avdva · 2016-09-16T12:11:09Z

Hi,
I've added support for dynamic tags here, branch dynamic-tags.
If you are interested in this feature, please build and test it, any critique is appreciated. If everything is ok, I'll make a PR.
The syntacs for service registration is the following:

{
    "service": {
        "name": "mongodb",
        "tags": ["tag1"],
        "dynamictags": [
            {
                "name": "master",
                "script": "/usr/local/bin/mongo-is-master.sh",
                "interval": "10s"
            }
        ],
        "address": "127.0.0.1",
        "port": 8000,
        "checks": [
            {
                "script": "/usr/local/bin/mongo-health-check.sh",
                "interval": "10s"
            }
        ]
    }
}

andremarianiello · 2017-05-02T14:42:57Z

This feature would be great for my use case. I would really like to see this merged in eventually.

Sieabah · 2017-06-23T21:24:32Z

+1 Consul DNS even with the two service method takes 15 to 30 minutes to propagate in the UI, API, and DNS.

slackpad · 2017-06-23T21:57:21Z

@Sieabah that sounds like a function of DNS caching some place - you can adjust the TTL value to maybe improve that. The API/UI shouldn't have any delay.

Sieabah · 2017-06-23T22:59:03Z

@slackpad I have all of the DNS caching set to 0. Querying the API and ignoring the DNS takes about the same amount of resolution.

I'm sure there is something misconfigured as when I monitor the two boxes they're saying "synced service:mongo" and "synced service:primary-mongo". With the current service definition I'm able to get it to 5 minutes. During that time both services actually say they're the primary (in the UI and API) even when in the logs they switch immediately.

{
  "service": {
    "name": "primary-mongo",
    "tags": ["primary", "mongo"],
    "port": 27017,
    "check": {
      "name": "primary",
      "script": "python ~/consul_check_tags.py $(mongo --eval 'db.isMaster().ismaster' | grep 'true')",
      "interval": "5s",
      "timeout": "1s"
    }
  }
}

I've tried both reregistering via the API, reloading the config during the health check, reloading from the API. I don't know what is making it take 5 minutes to propagate to a cluster of 3 server and 2 client other than the anti-entropy timeout of syncing only every 1 minute?

slackpad · 2017-06-25T00:12:39Z

@Sieabah we have a few issues we are looking into like #2970, but it may be worthwhile for you to open a new GH issue so we can try to track down what's happening to you. Better to do it on a different issue than this one.

caquino · 2017-08-22T19:01:00Z

👍 this would simplify a lot of "workaround" we did to achieve this functionality to have master/slave tags

adamlc · 2017-11-06T16:17:09Z

Did we get anywhere with this? I'm looking for something similar at the moment where I have a service that has a master / slave type setup.

ramukima · 2018-11-29T04:39:45Z

Not sure if Prepared Queries can be used to apply such rules. However, dynamic tagging is a good idea. Any plans to get it in ?

drawks · 2019-01-07T18:44:57Z

This still looks like a great idea, but I see no indication of any traction to having it merged. Anyone care to give us an update?

nicholasamorim · 2019-10-11T13:33:39Z

This still looks like a great idea, but I see no indication of any traction to having it merged. Anyone care to give us an update?

ShimmerGlass · 2019-10-12T08:55:24Z

This would be useful for us as well. What are your thoughts on how to design this ?
IMO a simple way would be to add a field like "output_as_tag": true to the check declaration struct. When set to true, the check output (as seen in Output in /v1/health/service/<service> query for example) would be captured and set as a tag, either on the node for a node check, or on the service for a service check.
If the value change, the previously set tag would be removed and the new one be added.
This tags would also be applied to sidecar services to ensure compatibility with Connect.

They are a few points to address tho :

If a command goes crazy and outputs 3Kb we probably don't want that as a tag, some filtering is probably needed.
We need to store that a tag comes from a check : if the output changes from "leader" to "follower", we need to remove the "leader" tag and add the "follower" tag. This needs to survive agent restart and be stored in the agent state files on disk, possibly requiring to update the file format.
This would also work for HTTP checks, especially since command checks can be dangerous (we have them disabled in our shop). Do we want the value for 50x's as tags ? I'd argue that we probably don't since in many cases 50x return a default "its not working" HTML, so dropping the tag in case of connection refused / http error codes would seem the most sensible to me.

pierresouchay · 2019-10-12T09:54:13Z

@Aestek I think this kind of stuff would be, I agree really useful. For now we have lots of services such as:

mycluster-zookeeper (all green in normal circumstances)
mycluster-zookeeper-leader (which have the same members, but tests are not the same, meaning that mycluster-zookeeper-leader are always 4 instances in warning state and 1 in passing

Having a way to merge those services in 1 single service and just add a tag leader would be great.

I know several systems where checks for this kind of features can also be simple HTTP checks, so limiting it to scripts is a bit less interesting.
I am not convinced by scrapping the output of regular checks to get the new tags because:

If you can set several checks with this value if it goes against another check ?
How to you add/remove existing tags ?

The #1048 (comment) looks like a sensible approach (I mean, not linked to existing checks), because:

for each dynamic check, it describes explicitly what tag would be added
it avoid conflicts between outputs of several checks not agreeing on tags
its output would not change the output of checks
it would allow checks not only being script, but HTTP, TCP and so on

I did not check in details what has been done in https://github.com/avdva/consul/tree/dynamic-tags but it sounds to me like the right approach. While limiting the ability to have very dynamic things, it would greatly ease implementation (by avoiding conflicts on several checks most notably)

avdva · 2019-10-14T22:22:19Z

I'll try to resurrect my branch soon, Will see, if it still works.

pierresouchay · 2019-10-18T16:31:15Z

@avdva we are really interested by this, tell us when you do so ;)

ShimmerGlass · 2019-11-06T10:22:02Z

@avdva Did you have time to resurrect your branch ? Hope it's not too complicated with all the conflict their must be since 2016

RedStalker · 2019-11-13T05:44:36Z

+1

hanshasselberg · 2020-02-18T22:21:14Z

This is an interesting idea and I could imagine us adding such feature. The best way to get it in is to create a PR so that we have something to discuss. That would also make it easier to see the impact.

exFalso · 2020-03-10T09:24:58Z

+1. I think @ShimmerGlass's suggestion is great, the tag should come from the script itself. This covers OP's use case but would solve additional ones.
In our case, we have a dynamically generated ID in some of our services (the ID comes from dedicated hardware, and must be generated within the service), and it'd be great if we could propagate this ID to consul. A great way to solve this is to have a periodic script return the tag(s) to be applied.

chris93111 · 2020-09-23T19:31:21Z

+1

EmPRio93 · 2020-10-05T12:08:15Z

+1

benvanstaveren · 2021-03-24T10:57:57Z

I'll throw my 2 cents in (in 2021) and say that this feature would still be very much welcome because it solves at least 4 of my setups that are now relying on a bunch of external scripts to update tags.

nbari · 2021-10-08T17:31:07Z

How does vault is doing it to mark the service as active, standby,initialized ? re-register the service?

* Bump eks cleanup token duration to 3h from 1h * Add slack notifications for failed cleanups

blake · 2022-01-22T08:39:04Z

How does vault is doing it to mark the service as active, standby,initialized ? re-register the service?

Yes, Vault re-registers the service in order to update any parameters that changed since the last registration. See the reconcileConsul function in [vault/serviceregistration/consul/consul_service_registration.go](https://github.com/hashicorp/vault/blob/v1.9.2/serviceregistration/consul/consul_service_registration.go) for details.

xenofree · 2023-01-30T08:50:45Z

We are really interested by this, any chance to see this feature in future ?

moonbaseDelta · 2023-03-10T07:32:37Z

I cant scrutinize why the hell this feature missing for years while it's even been half-implemented already.

sri4kanne · 2024-04-18T06:02:03Z

Came across this issue while we were trying to implement similar solution from using other means. Will be a good to have enhancement for sure!!

drawks · 2024-11-19T21:52:05Z

This issue has now been open for almost a decade.

Can someone from hashicorp please indicate whether this is actually on anyone's radar for planning? It is the luke-warmest of feedback to give this issue the "thinking" tag after being open for a week and then not do ANYTHING with the request for 9 years even with people actually suggesting actual implementations. There is basically no feedback about why or why not this may be considered or rejected from inclusion in the project.

with nearly 50 participants in the issue you'd think that hashicorp could managed to at least have some meaningful engagement with the open source community.

I'm gonna tag some recent contributors with merge/commit access as well as some execs to maybe prompt some actual engagement with this issue from anyone @sarahalsmiller @rboyer @xwa153 @dhiaayachi @jmurret @Amier3 @armon

ryanuber added the thinking More time is needed to research by the Consul Contributors label Jun 22, 2015

avdva added a commit to avdva/consul that referenced this issue Sep 15, 2016

consul: dynamic tags. Issue hashicorp#1048.

87a20f5

avdva added a commit to avdva/consul that referenced this issue Sep 15, 2016

consul: dynamic tags. Issue hashicorp#1048.

19cd1a2

avdva added a commit to avdva/consul that referenced this issue Sep 15, 2016

consul: dynamic tags. Issue hashicorp#1048.

3490b3c

sgrimm-sg mentioned this issue Feb 14, 2017

Best practices for sticky hot-spare configuration fabiolb/fabio#242

Open

slackpad added the theme/service-metadata Anything related to management/tracking of service metadata label May 2, 2017

duckhan pushed a commit to duckhan/consul that referenced this issue Oct 24, 2021

Bump eks cleanup token duration to 3h from 1h (hashicorp#1048)

3d235a8

* Bump eks cleanup token duration to 3h from 1h * Add slack notifications for failed cleanups

Dynamic tags applied like health checks. #1048

Dynamic tags applied like health checks. #1048

Comments

fidian commented Jun 19, 2015

ryanuber commented Jun 22, 2015

Kosta-Github commented Jun 22, 2015

walrusVision commented Jun 23, 2015

fidian commented Jun 23, 2015

Kosta-Github commented Jun 23, 2015

fidian commented Jun 23, 2015

igoratencompass commented Jul 13, 2015

eloycoto commented Jul 20, 2015

hugochinchilla commented Jul 27, 2015

xakraz commented Aug 20, 2015

jh409 commented Sep 24, 2015

adbourne commented Oct 8, 2015

memelet commented Nov 28, 2015

danielbenzvi commented Dec 7, 2015

wyhysj commented Dec 8, 2015

123BLiN commented Dec 14, 2015

richard-hulm commented Dec 17, 2015

slackpad commented Dec 19, 2015

mvanderlee commented Apr 14, 2016

jcua commented Apr 19, 2016

PedroAlvarado commented Apr 19, 2016

onnimonni commented May 11, 2016

nickwales commented May 17, 2016 • edited Loading

rafaelcapucho commented May 18, 2016

tomwganem commented May 19, 2016

avdva commented Sep 16, 2016 • edited Loading

andremarianiello commented May 2, 2017

Sieabah commented Jun 23, 2017

slackpad commented Jun 23, 2017

Sieabah commented Jun 23, 2017 • edited Loading

slackpad commented Jun 25, 2017

caquino commented Aug 22, 2017

adamlc commented Nov 6, 2017

ramukima commented Nov 29, 2018

drawks commented Jan 7, 2019

nicholasamorim commented Oct 11, 2019

ShimmerGlass commented Oct 12, 2019

pierresouchay commented Oct 12, 2019

avdva commented Oct 14, 2019

pierresouchay commented Oct 18, 2019

ShimmerGlass commented Nov 6, 2019

RedStalker commented Nov 13, 2019

hanshasselberg commented Feb 18, 2020

exFalso commented Mar 10, 2020

chris93111 commented Sep 23, 2020

EmPRio93 commented Oct 5, 2020

benvanstaveren commented Mar 24, 2021

nbari commented Oct 8, 2021

blake commented Jan 22, 2022

xenofree commented Jan 30, 2023

moonbaseDelta commented Mar 10, 2023

sri4kanne commented Apr 18, 2024

drawks commented Nov 19, 2024

This issue has now been open for almost a decade.

nickwales commented May 17, 2016 •

edited

Loading

avdva commented Sep 16, 2016 •

edited

Loading

Sieabah commented Jun 23, 2017 •

edited

Loading