Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hack/local-up-cluster.sh broken on latest 1.5 branch #38847

Closed
janetkuo opened this issue Dec 16, 2016 · 17 comments
Closed

hack/local-up-cluster.sh broken on latest 1.5 branch #38847

janetkuo opened this issue Dec 16, 2016 · 17 comments
Labels
kind/bug Categorizes issue or PR as related to a bug.
Milestone

Comments

@janetkuo
Copy link
Member

janetkuo commented Dec 16, 2016

local cluster failed to start in the latest 1.5 branch (commit 09cb1a9):

$ hack/local-up-cluster.sh                                                                                                                                  
WARNING : This script MAY be run as root for docker socket / iptables functionality; if failures occur, retry as root.
make: Entering directory `/usr/local/google/home//go/src/k8s.io/kubernetes'
make[1]: Entering directory `/usr/local/google/home//go/src/k8s.io/kubernetes'
make[1]: Leaving directory `/usr/local/google/home//go/src/k8s.io/kubernetes'
make[1]: Entering directory `/usr/local/google/home//go/src/k8s.io/kubernetes'
+++ [1215 15:40:40] Building the toolchain targets:
    k8s.io/kubernetes/hack/cmd/teststale
    k8s.io/kubernetes/vendor/github.com/jteeuwen/go-bindata/go-bindata
+++ [1215 15:40:40] Generating bindata:
    test/e2e/framework/gobindata_util.go
+++ [1215 15:40:44] Building go targets for linux/amd64:
    cmd/libs/go2idl/deepcopy-gen
+++ [1215 15:40:52] Building the toolchain targets:
    k8s.io/kubernetes/hack/cmd/teststale
    k8s.io/kubernetes/vendor/github.com/jteeuwen/go-bindata/go-bindata
+++ [1215 15:40:52] Generating bindata:
    test/e2e/framework/gobindata_util.go
+++ [1215 15:40:55] Building go targets for linux/amd64:
    cmd/libs/go2idl/defaulter-gen
+++ [1215 15:41:02] Building the toolchain targets:
    k8s.io/kubernetes/hack/cmd/teststale
    k8s.io/kubernetes/vendor/github.com/jteeuwen/go-bindata/go-bindata
+++ [1215 15:41:02] Generating bindata:
    test/e2e/framework/gobindata_util.go
+++ [1215 15:41:06] Building go targets for linux/amd64:
    cmd/libs/go2idl/conversion-gen
+++ [1215 15:41:14] Building the toolchain targets:
    k8s.io/kubernetes/hack/cmd/teststale
    k8s.io/kubernetes/vendor/github.com/jteeuwen/go-bindata/go-bindata
+++ [1215 15:41:14] Generating bindata:
    test/e2e/framework/gobindata_util.go
+++ [1215 15:41:17] Building go targets for linux/amd64:
    cmd/libs/go2idl/openapi-gen
make[1]: Leaving directory `/usr/local/google/home//go/src/k8s.io/kubernetes'
+++ [1215 15:41:26] Building the toolchain targets:
    k8s.io/kubernetes/hack/cmd/teststale
    k8s.io/kubernetes/vendor/github.com/jteeuwen/go-bindata/go-bindata
+++ [1215 15:41:26] Generating bindata:
    test/e2e/framework/gobindata_util.go
+++ [1215 15:41:29] Building go targets for linux/amd64:
    cmd/kubectl
    cmd/hyperkube
make: Leaving directory `/usr/local/google/home//go/src/k8s.io/kubernetes'
API SERVER insecure port is free, proceeding...
API SERVER secure port is free, proceeding...
Detected host and ready to start services.  Doing some housekeeping first...
Using GO_OUT /usr/local/google/home//go/src/k8s.io/kubernetes/_output/local/bin/linux/amd64
Starting services now!
Starting etcd
etcd --advertise-client-urls http://127.0.0.1:2379 --data-dir /tmp/tmp.XV5AbQKB0n --listen-client-urls http://127.0.0.1:2379 --debug > "/dev/null" 2>/dev/null
Waiting for etcd to come up.
+++ [1215 15:42:46] On try 2, etcd: : http://127.0.0.1:2379
{"action":"set","node":{"key":"/_test","value":"","modifiedIndex":4,"createdIndex":4}}
Waiting for apiserver to come up
!!! [1215 15:42:56] Timed out waiting for apiserver:  to answer at https://localhost:6443/version; tried 10 waiting 1 between each
Cleaning up...
@janetkuo janetkuo added the kind/bug Categorizes issue or PR as related to a bug. label Dec 16, 2016
@janetkuo
Copy link
Member Author

janetkuo commented Dec 16, 2016

It worked in v1.5.0 (58b7c16) and works on HEAD, but is broken in v1.5.1 (82450d0). @saad-ali

@saad-ali
Copy link
Member

saad-ali commented Dec 16, 2016

Thanks for noticing this @janetkuo

@kubernetes/sig-cluster-lifecycle can you please help investigate and triage this issue or help find the correct owner?

We need to:

  1. Verify that local-up-cluster is indeed broken in 1.5.1.
  2. Identify what the severity of this is and how quickly we need to get a fix out.
  3. Figure out why it's broken and fix it
  4. Figure out why the break of this script was not detected by any of our testing.

@janetkuo
Copy link
Member Author

Update: #38708 is the cause

@saad-ali
Copy link
Member

saad-ali commented Dec 16, 2016

CC @bprashanth

Local cluster up is a "hack" script we provide. It sucks that it broke in v1.5.1, but we do not need to rush a fix a v1.5.2 out the door to fix it. Instead we will wait for the normal v1.5.2 release schedule (post holiday).

Action items:

  • Update v1.5.1 release notes with a "Known Issues" section and add this issue to it, and, if possible, provide a workaround. @janetkuo
  • Fix this issue for the v1.5.2 release. @deads2k

We need an owner for both these items.

@janetkuo Could you own item 1?

@deads2k Could you own item 2?

@janetkuo
Copy link
Member Author

@janetkuo Could you own item 1?

Yes

@deads2k
Copy link
Contributor

deads2k commented Dec 16, 2016

@deads2k Could you own item 2?

Yeah. I fixed it in a different pull once before

@janetkuo
Copy link
Member Author

if possible, provide a workaround

Is the following workaround acceptable?

Pass --anonymous-auth=true to sudo -E "${GO_OUT}/hyperkube" apiserver ...

@saad-ali
Copy link
Member

Thank you both.

janetkuo commented 3 minutes ago
Is the following workaround acceptable?

Pass --anonymous-auth=true to sudo -E "${GO_OUT}/hyperkube" apiserver ...

@deads2k?

@deads2k
Copy link
Contributor

deads2k commented Dec 16, 2016

Is the following workaround acceptable?

Pass --anonymous-auth=true to sudo -E "${GO_OUT}/hyperkube" apiserver ...

Yeah, that's actually better than what I did with the modification I made. You want to do that instead?

@janetkuo
Copy link
Member Author

Update v1.5.1 release notes with a "Known Issues" section and add this issue to it, and, if possible, provide a workaround.

Filed #38884

k8s-github-robot pushed a commit that referenced this issue Dec 17, 2016
Automatic merge from submit-queue

Document known issue for broken local-up-cluster script in 1.5.1

Ref #38847
k8s-github-robot pushed a commit that referenced this issue Dec 19, 2016
Automatic merge from submit-queue

make local-up-cluster.sh match "normal" launch without anonymous access

Ref #38847

This changes the readiness detection, but keeps the defaults matching the "normal" launch without anonymous auth.  

```release-note
Fixes an issue where `hack/local-up-cluster.sh` would fail on the API server start with

!!! [1215 15:42:56] Timed out waiting for apiserver:  to answer at https://localhost:6443/version; tried 10 waiting 1 between each
```

@janetkuo @saad-ali
@lavalamp
Copy link
Member

I can confirm that without @janetkuo's suggested fix, local-up-cluster doesn't actually produce a functioning cluster. Controller manager is blocked from accessing anything.

@lavalamp
Copy link
Member

Well, let me back off on that claim. The controller manager seems to be able to talk to apiserver again, but pods don seem to be able to.

E0126 00:59:10.717842 8 election.go:226] error retrieving resource lock default/etcd-operator: Get https://10.0.0.1:443/api/v1/namespaces/default/endpoints/etcd-operator: dial tcp 10.0.0.1:443: getsockopt: connection timed out

e.g. etcd-operator prints a stream of the above.

@lavalamp
Copy link
Member

I think the issue is that the apiserver is failing to give the kubernetes service any endpoints.

@lavalamp
Copy link
Member

E0125 17:10:59.076148    4623 controller.go:162] unable to sync kubernetes service: Endpoints "kubernetes" is invalid: subsets[0].addresses[0].ip: Invalid value: "127.0.0.1": may not be in the loopback range (127.0.0.0/8)

@lavalamp
Copy link
Member

OK, I figured it out.

     if [[ "${API_HOST}" != "127.0.0.1" ]]; then
         advertise_address="--advertise_address=${API_HOST_IP}"
     fi

should be:

     if [[ "${API_HOST_IP}" != "127.0.0.1" ]]; then
         advertise_address="--advertise_address=${API_HOST_IP}"
     fi

because the API_HOST variable is "localhost", not 127.0.0.1.

@lavalamp
Copy link
Member

I sent #40501 with @janetkuo's fix and my fix.

k8s-github-robot pushed a commit that referenced this issue Jan 26, 2017
Automatic merge from submit-queue

Actually fix local-cluster-up on 1.5 branch

Fixes #38847 (for real)
@lavalamp
Copy link
Member

This ought to be fixed now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

4 participants